Hi, the lm switch for train-model.perl consists of a 4-tuple for each language model of the form: < factor, order, filename, type >
If you do not use factored translation models, then factor is 0. If you use IRSTLM, then type is 0, if you use SRILM it is 1, for KenLM it is 8. In your example > lmodel-file: 8 1 3 /home/dseita/lmKaucha/kauchakTraining.blm.simp you specify a 3-gram LM over factor 1, using KenLM (the KenLM implementation is able to read a SRILM and compile it on the fly). I all likelihood, you are not using a factored translation model that generates output factor 1 (but only output factor 0), so the language model is trying to score factors that do not exist. Solution: - use -lm 0:3:/[...LM FILE...]:8 if you LM file is in SRILM or KenLM format - use -lm 0:3:/[...LM FILE...]:1 if you LM file is in IRSTLM format -phi On Mon, Jun 25, 2012 at 1:14 PM, <[email protected]> wrote: > Hi, > > I am trying to tune a dataset according to how the manual does it, but I am > having problems. First, I got this error: > > http://comments.gmane.org/gmane.comp.nlp.moses.user/6239 > > I tried to modify the ini file to set it to be 1 (because I'm using IRSTLM > and I have > successfully tuned the baseline data set on the Moses website with it before) > but > it still didn't work. It doesn't seem to remember the changes when I tune. I > then > tried to retrain my data with this command, switching what used to be a 0 > there > to be a 1: > > nohup [ ...STUFF HERE...] -lm 1:3:/[...LM FILE...]:8 >& work/training.out > > But I am confused on what the numbers 1 (previously 0), 3 and 8 mean. The > manual says to do 0, but when I did that, I ended up with the "LM Not found, > probably not compiled.." error I mentioned earlier. So I tried switching it > to 1 > since according to another moses tutorial I found online, that indicates the > LM to > be compiled with IRSTLM (?). > > So now, it DOES load the language model, but the decoder dies due to a > segmentation fault (see below). > > The error has to be here: > lmodel-file: 8 1 3 /home/dseita/lmKaucha/kauchakTraining.blm.simp > > I'm not sure what to do at this point. During my first run, this line had the > numbers "0 0 3" and the mert.out file told me that the LM couldn't be found > because it probably wasn't loaded into the library. > > Thanks for any help you can provide. I have a tokenized and truecased dataset > so I think the problem lies with the language model. I binarized it before > training > and tuning. Here is the mert.out file that I mentioned earlier: > > nohup: ignoring input > Using SCRIPTS_ROOTDIR: /home/dseita/mosesdecoder/scripts > Assuming the tables are already filtered, reusing filtered/moses.ini > Using cached features list: ./features.list > MERT starting values and ranges for random generation: > d = 0.300 ( 0.00 .. 1.00) > d = 0.300 ( 0.00 .. 1.00) > d = 0.300 ( 0.00 .. 1.00) > d = 0.300 ( 0.00 .. 1.00) > d = 0.300 ( 0.00 .. 1.00) > d = 0.300 ( 0.00 .. 1.00) > d = 0.300 ( 0.00 .. 1.00) > lm = 0.500 ( 0.00 .. 1.00) > w = -1.000 ( 0.00 .. 1.00) > tm = 0.200 ( 0.00 .. 1.00) > tm = 0.200 ( 0.00 .. 1.00) > tm = 0.200 ( 0.00 .. 1.00) > tm = 0.200 ( 0.00 .. 1.00) > tm = 0.200 ( 0.00 .. 1.00) > run 1 start at Mon Jun 25 12:17:55 EDT 2012 > Parsing --decoder-flags: || > Saving new config to: ./run1.moses.ini > Saved: ./run1.moses.ini > Normalizing lambdas: 0.300000 0.300000 0.300000 0.300000 0.300000 > 0.300000 0.300000 0.500000 -1.000000 0.200000 0.200000 0.200000 > 0.200000 0.200000 > DECODER_CFG = -w -0.217391 -lm 0.108696 -d 0.065217 0.065217 0.065217 > 0.065217 0.065217 0.065217 0.065217 -tm 0.043478 0.043478 0.043478 > 0.043478 0.043478 > Executing: /home/dseita/mosesdecoder/bin/moses -config filtered/moses.ini > -inputtype 0 -w -0.217391 -lm 0.108696 -d 0.065217 0.065217 0.065217 > 0.065217 0.065217 0.065217 0.065217 -tm 0.043478 0.043478 0.043478 > 0.043478 0.043478 -n-best-list run1.best100.out 100 -input-file > /home/dseita/KauchakCorpus/kauchakTuning.true.norm > run1.out > (1) run decoder to produce n-best lists > params = > decoder_config = -w -0.217391 -lm 0.108696 -d 0.065217 0.065217 > 0.065217 0.065217 0.065217 0.065217 0.065217 -tm 0.043478 0.043478 > 0.043478 0.043478 0.043478 > Defined parameters (per moses.ini or switch): > config: filtered/moses.ini > distortion-file: 0-0 wbe-msd-bidirectional-fe-allff 6 > /home/dseita/KauchakWorking/mert-work/filtered/reordering-table.wbe-msd- > bidirectional-fe > distortion-limit: 6 > input-factors: 0 > input-file: /home/dseita/KauchakCorpus/kauchakTuning.true.norm > inputtype: 0 > lmodel-file: 8 1 3 /home/dseita/lmKauchak/kauchakTraining.blm.simp > mapping: 0 T 0 > n-best-list: run1.best100.out 100 > ttable-file: 0 0 0 5 /home/dseita/KauchakWorking/mert- > work/filtered/phrase-table.0-0.1.1.gz > ttable-limit: 20 > weight-d: 0.065217 0.065217 0.065217 0.065217 0.065217 0.065217 > 0.065217 > weight-l: 0.108696 > weight-t: 0.043478 0.043478 0.043478 0.043478 0.043478 > weight-w: -0.217391 > Loading lexical distortion models...have 1 models > Creating lexical reordering... > weights: 0.065 0.065 0.065 0.065 0.065 0.065 > Loading table into memory...done. > Start loading LanguageModel > /home/dseita/lmKauchak/kauchakTraining.blm.simp : [14.000] seconds > Finished loading LanguageModels : [14.000] seconds > Start loading PhraseTable /home/dseita/KauchakWorking/mert- > work/filtered/phrase-table.0-0.1.1.gz : [14.000] seconds > filePath: /home/dseita/KauchakWorking/mert-work/filtered/phrase-table.0- > 0.1.1.gz > Finished loading phrase tables : [14.000] seconds > Start loading phrase table from /home/dseita/KauchakWorking/mert- > work/filtered/phrase-table.0-0.1.1.gz : [14.000] seconds > Reading /home/dseita/KauchakWorking/mert-work/filtered/phrase-table.0- > 0.1.1.gz > ----5---10---15---20---25---30---35---40---45---50---55---60--- > 65---70---75---80---85---90---95--100 > ********************************************************************************* > ******************* > Finished loading phrase tables : [18.000] seconds > Created input-output object : [18.000] seconds > Translating line 0 in thread id 3046488896 > Translating: for administrative purposes Mastung was separated from Kalat and > made a new district in 1991 . > > Collecting options took 0.020 seconds > Segmentation fault (core dumped) > Exit code: 139 > The decoder died. CONFIG WAS -w -0.217391 -lm 0.108696 -d 0.065217 > 0.065217 0.065217 0.065217 0.065217 0.065217 0.065217 -tm 0.043478 > 0.043478 0.043478 0.043478 0.043478 > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
