Hi Sanjna, You are getting the error in threshold.pl
This means that Miner did not function correctly and the *.probs file which the threshold script takes is empty. Are you running the training manually or through train-transliteration-module.pl? Please make sure to run the cleaning script on your word list before running the miner. If above doesn't help, send me your 1-1 word list or parallel data (with alignments) on which miner is running. Cheers, Nadir On Thu, May 5, 2016 at 1:54 PM, <[email protected]> wrote: > Send Moses-support mailing list submissions to > [email protected] > > To subscribe or unsubscribe via the World Wide Web, visit > http://mailman.mit.edu/mailman/listinfo/moses-support > or, via email, send a message with subject or body 'help' to > [email protected] > > You can reach the person managing the list at > [email protected] > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Moses-support digest..." > > > Today's Topics: > > 1. Re: Data for building a factored model (Sa?o Kuntaric) > 2. Tranliteration error (Sanjanashree Palanivel) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 4 May 2016 21:30:17 +0200 > From: Sa?o Kuntaric <[email protected]> > Subject: Re: [Moses-support] Data for building a factored model > To: Marwa Refaie <[email protected]> > Cc: [email protected] > Message-ID: > <CANsquDosSSn=__ > [email protected]> > Content-Type: text/plain; charset="utf-8" > > Hello again, > > I believe I can wrap my head around the theoretical part, but the English > and German corpora in the Moses factored model tutorial ( > http://www.statmt.org/moses/?n=Moses.FactoredTutorial) look beautifully > factored, so my question is how were the original corpora processed? Was a > specific tagger used and was there any manual/script postprocessing done? > > And since I am already bugging everyone, how is the language model pos.lm > created? Is it extracted from a file, created manually or in another way? > > Thank you in advance for all the replies. > > Best regards, > > Sa?o > > 2016-05-02 19:45 GMT+02:00 Marwa Refaie <[email protected]>: > > > Corpus for translation model should be on 2 parallel files in the format > > Word | pos | Lema .... For example , by a file for each language. You can > > prepare files using word net , Stanford , or any tagger & stemmer as can > > deal with your language pairs. May be before enter the files to moses you > > should adjust the text files by a python script (write it your self) > > > > For language model ... You must build it as follows > > Verb noun noun > > Noun Det adj > > ....... Depending on the target language only ,, Then build it as usual > > n-gram lm. > > > > Sent from my iPad > > > > > On May 2, 2016, at 10:11, Sa?o Kuntaric <[email protected]> > wrote: > > > > > > Hi all, > > > > > > I am having some issues producing the corpora in the correct format for > > Moses to execute factored training. > > > > > > I am looking at the factored tutorial on the Moses website and I am > > wondering, how to get such consistent corpora for two languages. What > tools > > are being used and can they be trained for specific languages (Slovenian > in > > my example). Are such tools available for download or is such data > produced > > with custom scripts? > > > > > > -- > > > Best regards, > > > > > > Sa?o > > > _______________________________________________ > > > Moses-support mailing list > > > [email protected] > > > http://mailman.mit.edu/mailman/listinfo/moses-support > > > > > > -- > lp, > > Sa?o > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > http://mailman.mit.edu/mailman/private/moses-support/attachments/20160504/4ecbc25b/attachment-0001.html > > ------------------------------ > > Message: 2 > Date: Thu, 5 May 2016 16:24:04 +0530 > From: Sanjanashree Palanivel <[email protected]> > Subject: [Moses-support] Tranliteration error > To: [email protected] > Message-ID: > <CAAc_kp69zSo0hBAkO= > [email protected]> > Content-Type: text/plain; charset="utf-8" > > Dear All, > > > When I try to train transliteration i get following error, I dont > know what is missing please help. > > Extracting Transliteration Pairs > > Constructing Graph > > Computing Probs : iteration 1 > > Computing Probs : iteration 2 > > Computing Probs : iteration 3 > > Computing Probs : iteration 4 > > Computing Probs : iteration 5 > > Computing Probs : iteration 6 > > Computing Probs : iteration 7 > > Computing Probs : iteration 8 > > Computing Probs : iteration 9 > > Computing Probs : iteration 10 > > Finished... > > Selecting Transliteration Pairs with threshold 0.5 > > Name "main::hash" used only once: possible typo at > > /home/sanjana/Documents/SMT/mosesdecoder/scripts/Transliteration/ > > threshold.pl line 26. > > Preparing Corpus > > Align Corpus > > Using SCRIPTS_ROOTDIR: /home/sanjana/Documents/SMT/mosesdecoder/scripts > > Using multi-thread GIZA > > ERROR: Cannot find > > /home/sanjana/Documents/SMT/mosesdecoder/tools/merge_alignment.py at > > > /home/sanjana/Documents/SMT/mosesdecoder/scripts/training/train-model.perl > > line 393. > > Using SCRIPTS_ROOTDIR: /home/sanjana/Documents/SMT/mosesdecoder/scripts > > Using multi-thread GIZA > > ERROR: Cannot find > > /home/sanjana/Documents/SMT/mosesdecoder/tools/merge_alignment.py at > > > /home/sanjana/Documents/SMT/mosesdecoder/scripts/training/train-model.perl > > line 393. > > Using SCRIPTS_ROOTDIR: /home/sanjana/Documents/SMT/mosesdecoder/scripts > > Using multi-thread GIZA > > ERROR: Cannot find > > /home/sanjana/Documents/SMT/mosesdecoder/tools/merge_alignment.py at > > > /home/sanjana/Documents/SMT/mosesdecoder/scripts/training/train-model.perl > > line 393. > > Using SCRIPTS_ROOTDIR: /home/sanjana/Documents/SMT/mosesdecoder/scripts > > using gzip > > (3) generate word alignment @ Thu May 5 16:19:50 IST 2016 > > Combining forward and inverted alignment from files: > > > > > /home/sanjana/Documents/SMT/Transliteration/training/giza-inverse/en-hi.A3.final.{bz2,gz} > > > > > /home/sanjana/Documents/SMT/Transliteration/training/giza/hi-en.A3.final.{bz2,gz} > > ERROR: Can't read > > > /home/sanjana/Documents/SMT/Transliteration/training/giza-inverse/en-hi.A3.final.{bz2,gz} > > Train Translation Models > > Using SCRIPTS_ROOTDIR: /home/sanjana/Documents/SMT/mosesdecoder/scripts > > using gzip > > (4) generate lexical translation table 0-0 @ Thu May 5 16:19:50 IST 2016 > > > > > (/home/sanjana/Documents/SMT/Transliteration/training/corpus.en,/home/sanjana/Documents/SMT/Transliteration/training/corpus.hi,/home/sanjana/Documents/SMT/Transliteration/model/lex) > > ERROR: Can't read > > > /home/sanjana/Documents/SMT/Transliteration/model/aligned.grow-diag-final-and > > at > > > /home/sanjana/Documents/SMT/mosesdecoder/scripts/training/LexicalTranslationModel.pm > > line 92. > > Using SCRIPTS_ROOTDIR: /home/sanjana/Documents/SMT/mosesdecoder/scripts > > using gzip > > (5) extract phrases @ Thu May 5 16:19:50 IST 2016 > > File not found: > > > /home/sanjana/Documents/SMT/Transliteration/model/aligned.grow-diag-final-and > > at > > > /home/sanjana/Documents/SMT/mosesdecoder/scripts/training/train-model.perl > > line 1609. > > Using SCRIPTS_ROOTDIR: /home/sanjana/Documents/SMT/mosesdecoder/scripts > > using gzip > > (6) score phrases @ Thu May 5 16:19:50 IST 2016 > > (6.1) creating table half > > /home/sanjana/Documents/SMT/Transliteration/model/phrase-table.half.f2e @ > > Thu May 5 16:19:50 IST 2016 > > > /home/sanjana/Documents/SMT/mosesdecoder/scripts/generic/score-parallel.perl > > 8 "sort " > /home/sanjana/Documents/SMT/mosesdecoder/scripts/../bin/score > > /home/sanjana/Documents/SMT/Transliteration/model/extract.sorted.gz > > /home/sanjana/Documents/SMT/Transliteration/model/lex.f2e > > > /home/sanjana/Documents/SMT/Transliteration/model/phrase-table.half.f2e.gz > > --KneserNey 0 > > Executing: > > > /home/sanjana/Documents/SMT/mosesdecoder/scripts/generic/score-parallel.perl > > 8 "sort " > /home/sanjana/Documents/SMT/mosesdecoder/scripts/../bin/score > > /home/sanjana/Documents/SMT/Transliteration/model/extract.sorted.gz > > /home/sanjana/Documents/SMT/Transliteration/model/lex.f2e > > > /home/sanjana/Documents/SMT/Transliteration/model/phrase-table.half.f2e.gz > > --KneserNey 0 > > using gzip > > Started Thu May 5 16:19:50 2016 > > gzip: > /home/sanjana/Documents/SMT/Transliteration/model/extract.sorted.gz: > > No such file or directory > > /home/sanjana/Documents/SMT/mosesdecoder/scripts/../bin/score > > /home/sanjana/Documents/SMT/Transliteration/model/tmp.10464/extract.0.gz > > /home/sanjana/Documents/SMT/Transliteration/model/lex.f2e > > > /home/sanjana/Documents/SMT/Transliteration/model/tmp.10464/phrase-table.half.0000000.gz > > --KneserNey 2>> /dev/stderr > > /home/sanjana/Documents/SMT/Transliteration/model/tmp.10464/ > > > run.0.sh/home/sanjana/Documents/SMT/Transliteration/model/tmp.10464/run.1.sh/home/sanjana/Documents/SMT/Transliteration/model/tmp.10464/run.2.sh/home/sanjana/Documents/SMT/Transliteration/model/tmp.10464/run.3.sh/home/sanjana/Documents/SMT/Transliteration/model/tmp.10464/run.4.sh/home/sanjana/Documents/SMT/Transliteration/model/tmp.10464/run.5.sh/home/sanjana/Documents/SMT/Transliteration/model/tmp.10464/run.6.sh/home/sanjana/Documents/SMT/Transliteration/model/tmp.10464/run.7.shScore > > v2.1 -- scoring methods for extracted rules > > adjusting phrase translation probabilities with Kneser Ney discounting > > Loading lexical translation table from > > /home/sanjana/Documents/SMT/Transliteration/model/lex.f2eCan't read > > /home/sanjana/Documents/SMT/Transliteration/model/lex.f2e > > mv > > > /home/sanjana/Documents/SMT/Transliteration/model/tmp.10464/phrase-table.half.0000000.gz > > > /home/sanjana/Documents/SMT/Transliteration/model/phrase-table.half.f2e.gzmv: > > cannot stat > > > '/home/sanjana/Documents/SMT/Transliteration/model/tmp.10464/phrase-table.half.0000000.gz': > > No such file or directory > > Exit code: 1 > > ERROR: Scoring of phrases failed at > > > /home/sanjana/Documents/SMT/mosesdecoder/scripts/training/train-model.perl > > line 1773. > > (6.3) creating table half > > /home/sanjana/Documents/SMT/Transliteration/model/phrase-table.half.e2f @ > > Thu May 5 16:19:50 IST 2016 > > > /home/sanjana/Documents/SMT/mosesdecoder/scripts/generic/score-parallel.perl > > 8 "sort " > /home/sanjana/Documents/SMT/mosesdecoder/scripts/../bin/score > > /home/sanjana/Documents/SMT/Transliteration/model/extract.inv.sorted.gz > > /home/sanjana/Documents/SMT/Transliteration/model/lex.e2f > > > /home/sanjana/Documents/SMT/Transliteration/model/phrase-table.half.e2f.gz > > --Inverse --KneserNey 1 > > Executing: > > > /home/sanjana/Documents/SMT/mosesdecoder/scripts/generic/score-parallel.perl > > 8 "sort " > /home/sanjana/Documents/SMT/mosesdecoder/scripts/../bin/score > > /home/sanjana/Documents/SMT/Transliteration/model/extract.inv.sorted.gz > > /home/sanjana/Documents/SMT/Transliteration/model/lex.e2f > > > /home/sanjana/Documents/SMT/Transliteration/model/phrase-table.half.e2f.gz > > --Inverse --KneserNey 1 > > using gzip > > Started Thu May 5 16:19:50 2016 > > gzip: > > /home/sanjana/Documents/SMT/Transliteration/model/extract.inv.sorted.gz: > No > > such file or directory > > /home/sanjana/Documents/SMT/mosesdecoder/scripts/../bin/score > > /home/sanjana/Documents/SMT/Transliteration/model/tmp.10512/extract.0.gz > > /home/sanjana/Documents/SMT/Transliteration/model/lex.e2f > > > /home/sanjana/Documents/SMT/Transliteration/model/tmp.10512/phrase-table.half.0000000.gz > > --Inverse --KneserNey 2>> /dev/stderr > > /home/sanjana/Documents/SMT/Transliteration/model/tmp.10512/ > > > run.0.sh/home/sanjana/Documents/SMT/Transliteration/model/tmp.10512/run.1.sh/home/sanjana/Documents/SMT/Transliteration/model/tmp.10512/run.2.sh/home/sanjana/Documents/SMT/Transliteration/model/tmp.10512/run.3.sh/home/sanjana/Documents/SMT/Transliteration/model/tmp.10512/run.5.sh/home/sanjana/Documents/SMT/Transliteration/model/tmp.10512/run.6.sh/home/sanjana/Documents/SMT/Transliteration/model/tmp.10512/run.7.sh/home/sanjana/Documents/SMT/Transliteration/model/tmp.10512/run.4.shScore > > v2.1 -- scoring methods for extracted rules > > using inverse mode > > adjusting phrase translation probabilities with Kneser Ney discounting > > Loading lexical translation table from > > /home/sanjana/Documents/SMT/Transliteration/model/lex.e2fCan't read > > /home/sanjana/Documents/SMT/Transliteration/model/lex.e2f > > gunzip -c > > > /home/sanjana/Documents/SMT/Transliteration/model/tmp.10512/phrase-table.half.*.gz > > 2>> /dev/stderr| LC_ALL=C sort -T > > /home/sanjana/Documents/SMT/Transliteration/model/tmp.10512 | gzip -c > > > > /home/sanjana/Documents/SMT/Transliteration/model/phrase-table.half.e2f.gz > > 2>> /dev/stderr gzip: > > > /home/sanjana/Documents/SMT/Transliteration/model/tmp.10512/phrase-table.half.*.gz: > > No such file or directory > > rm -rf /home/sanjana/Documents/SMT/Transliteration/model/tmp.10512 > > Finished Thu May 5 16:19:50 2016 > > (6.6) consolidating the two halves @ Thu May 5 16:19:50 IST 2016 > > Executing: > > /home/sanjana/Documents/SMT/mosesdecoder/scripts/../bin/consolidate > > > /home/sanjana/Documents/SMT/Transliteration/model/phrase-table.half.f2e.gz > > > /home/sanjana/Documents/SMT/Transliteration/model/phrase-table.half.e2f.gz > > /dev/stdout --KneserNey > > > /home/sanjana/Documents/SMT/Transliteration/model/phrase-table.half.f2e.gz.coc > > | gzip -c > > > /home/sanjana/Documents/SMT/Transliteration/model/phrase-table.gz > > Consolidate v2.0 written by Philipp Koehn > > consolidating direct and indirect rule tables > > adjusting phrase translation probabilities with Kneser Ney discounting > > Can't read > > > /home/sanjana/Documents/SMT/Transliteration/model/phrase-table.half.f2e.gz.coc > > Executing: rm -f > > /home/sanjana/Documents/SMT/Transliteration/model/phrase-table.half.* > > Train Language Models > > one of required modified KneserNey count-of-counts is zero > > error in discount estimator for order 2 > > while opening /home/sanjana/Documents/SMT/Transliteration/lm/targetLM > > ERROR > > Create Config File > > Using SCRIPTS_ROOTDIR: /home/sanjana/Documents/SMT/mosesdecoder/scripts > > using gzip > > ERROR: Language model file not found or empty: > > /home/sanjana/Documents/SMT/Transliteration/lm/targetLM.bin at > > > /home/sanjana/Documents/SMT/mosesdecoder/scripts/training/train-model.perl > > line 602. > > Running Tuning for Transliteration Module > > Using SCRIPTS_ROOTDIR: /home/sanjana/Documents/SMT/mosesdecoder/scripts > > using gzip > > (9) create moses.ini @ Thu May 5 16:19:50 IST 2016 > > Executing: mkdir -p > > /home/sanjana/Documents/SMT/Transliteration/tuning/filtered > > Stripping XML... > > Executing: > > > /home/sanjana/Documents/SMT/mosesdecoder/scripts/training/../generic/strip-xml.perl > > < /home/sanjana/Documents/SMT/Transliteration/tuning/input > > > /home/sanjana/Documents/SMT/Transliteration/tuning/filtered/input.10592 > > pt:PhraseDictionaryMemory name=TranslationModel0 num-features=4 > > path=/home/sanjana/Documents/SMT/Transliteration/model/phrase-table > > input-factor=0 output-factor=0 > > Considering factor 0 > > Filtering files... > > filtering /home/sanjana/Documents/SMT/Transliteration/model/phrase-table > > -> > > > /home/sanjana/Documents/SMT/Transliteration/tuning/filtered/phrase-table.0-0.1.1... > > No phrases found in > > /home/sanjana/Documents/SMT/Transliteration/model/phrase-table! at > > /home/sanjana/Documents/SMT/mosesdecoder/scripts/training/ > > filter-model-given-input.pl line 398. > > sh: 1: cannot open > > /home/sanjana/Documents/SMT/Transliteration/model/moses.ini: No such file > > Using SCRIPTS_ROOTDIR: /home/sanjana/Documents/SMT/mosesdecoder/scripts > > File not found: > > /home/sanjana/Documents/SMT/Transliteration/tuning/moses.filtered.ini > > (interpreted as > > /home/sanjana/Documents/SMT/Transliteration/tuning/moses.filtered.ini). > at > > /home/sanjana/Documents/SMT/mosesdecoder/scripts/training/mert-moses.pl > > line 494. > > cp: cannot stat > > ?/home/sanjana/Documents/SMT/Transliteration/tuning/tmp/moses.ini?: No > such > > file or directory > > ERROR cannot open base-ini > > '/home/sanjana/Documents/SMT/Transliteration/model/moses.ini': No such > file > > or directory at > > > /home/sanjana/Documents/SMT/mosesdecoder/scripts/ems/support/substitute-weights.perl > > line 16. > > Training Transliteration Module - End Thu May 5 16:19:50 IST 2016 > > > > -- > Thanks and regards, > > Sanjanasri J.P > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > http://mailman.mit.edu/mailman/private/moses-support/attachments/20160505/3a332748/attachment.html > > ------------------------------ > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > > > End of Moses-support Digest, Vol 115, Issue 4 > ********************************************* >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
