Hi, To use the binarized IRST LM, you just need to compile the SRILM LM, no need to train the model with IRST tools. See Moses documentation for details.
-phi On Tue, Jul 22, 2008 at 12:31 PM, Miguel José Hernández Vidal <[EMAIL PROTECTED]> wrote: > I've also tried to run moses with a binarized (with compile-lm) SRI > language model. When I run the decoder I see a segmentation fault error: > > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > [EMAIL PROTECTED]:~$ ~/moses/moses-cmd/src/moses -config > ~/ESCA/model/moses.ini > -input-file ~/ESCA/tuning/input > ~/ESCA/evaluation/output > Defined parameters (per moses.ini or switch): > config: /home/esca/ESCA/model/moses.ini > distortion-file: 0-0 msd-bidirectional-fe 6 > /home/esca/ESCA/model/reordering > distortion-limit: 6 > input-factors: 0 > input-file: /home/esca/ESCA/tuning/input > lmodel-file: 1 0 5 /home/esca/ESCA/lm/ca.blm > mapping: 0 T 0 > ttable-file: 0 0 5 /home/esca/ESCA/model/phrase-table > ttable-limit: 20 > weight-d: 0.3 0.3 0.3 0.3 0.3 0.3 0.3 > weight-l: 0.5000 > weight-t: 0.2 0.2 0.2 0.2 0.2 > weight-w: -1 > Loading lexical distortion models... > have 1 models > Creating lexical reordering... > weights: 0.300 0.300 0.300 0.300 0.300 0.300 > binary file loaded, default OFF_T: -1 > Created lexical orientation reordering > Start loading LanguageModel /home/esca/ESCA/lm/ca.blm : [1.000] seconds > In LanguageModelIRST::Load: nGramOrder = 5 > Loading LM file (no MAP) > blmt > loadbin() > loading 321187 1-grams > loading 4548952 2-grams > loading 2785668 3-grams > loading 2501764 4-grams > loading 1741048 5-grams > done > OOV code is 37189 > IRST: m_unknownId=37189 > Fallo de segmentación (core dumped) #SEGMENTATION FAULT > -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > I am using binarized phrase and reordering tables, but they worked fine > when I build them with my old SRILM system. > > Thanks for your help. > > Regards, > > Miguel > > Miguel José Hernández Vidal wrote: >> Hi mailing, >> >> I am trying to build my lm with IRST toolkit. First, I've added <s> >> tags with 'add-start-end.sh' and, obviously, have my data tokenized & >> lowercased. >> >> When I run 'build-lm.sh' it looks like it works fine, but at the end >> of the process no output file is found. Here's the log: >> >> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> >> [EMAIL PROTECTED]:~/irstlm/bin$ bash build-lm.sh -i ~/corpus/tag.es -o >> ~/corpus/ca.lm -n 3 -k 5 -s kneser-ney >> Cleaning temporary directory stat >> Extracting dictionary from training corpus >> Splitting dictionary into 5 lists >> Extracting n-gram statistics for each word list >> dict.000 >> dict.001 >> dict.002 >> dict.003 >> dict.004 >> Estimating language models for each word list >> dict.000 >> dict.001 >> dict.002 >> dict.003 >> dict.004 >> Merging language models into /home/esca/corpus/ca.lm >> Cleaning temporary directory stat >> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ >> >> >> I've tried with different corpus sizes, but it didn't work either. >> btw, I am running the scripts under Ubuntu 7.04 32bit. >> >> Regards, >> >> Miguel >> > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
