Hi, this is very weird. You are using the 'irstlm/src/compile-lm' command, are you? I was first a bit confused (actually still am), because there is also a SRILM binary format.
-phi On Wed, Jul 23, 2008 at 10:50 AM, Miguel José Hernández Vidal <[EMAIL PROTECTED]> wrote: > Hi Philipp, > > Thanks for your advice. Maybe I've done something wrong, although I followed > Moses' documentation guidelines. > > First, I compiled separately a new Moses environment '--with-irstlm'. > Next I ran the following in order to have a binarized version of my SRI > language model: > $ ./compile-lm corpus.ca.lm ca.blm > > Then I updated my moses.ini with the new settings: > 1 0 5 /home/esca/ESCA/lm/ca.blm > > At last, I ran moses compiled with irstlm version and I had the > 'segmentation fault' error. > > > I managed to run the binarized SRI model in the following way: > > After 'compile-lm' I updated moses.ini: > 0 0 5 /home/esca/ESCA/lm/ca.blm > > And then I ran moses (compiled with SRILM) without any errors. > > > I thought binarized language models had to be decoded with the IRST compiled > version of Moses. Am I wrong? > > Regards, > Miguel > > Philipp Koehn wrote: >> >> Hi, >> >> To use the binarized IRST LM, you just need to compile the SRILM LM, >> no need to train the model with IRST tools. See Moses documentation >> for details. >> >> -phi >> >> On Tue, Jul 22, 2008 at 12:31 PM, Miguel José Hernández Vidal >> <[EMAIL PROTECTED]> wrote: >> >>> >>> I've also tried to run moses with a binarized (with compile-lm) SRI >>> language model. When I run the decoder I see a segmentation fault error: >>> >>> >>> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >>> [EMAIL PROTECTED]:~$ ~/moses/moses-cmd/src/moses -config >>> ~/ESCA/model/moses.ini >>> -input-file ~/ESCA/tuning/input > ~/ESCA/evaluation/output >>> Defined parameters (per moses.ini or switch): >>> config: /home/esca/ESCA/model/moses.ini >>> distortion-file: 0-0 msd-bidirectional-fe 6 >>> /home/esca/ESCA/model/reordering >>> distortion-limit: 6 >>> input-factors: 0 >>> input-file: /home/esca/ESCA/tuning/input >>> lmodel-file: 1 0 5 /home/esca/ESCA/lm/ca.blm >>> mapping: 0 T 0 >>> ttable-file: 0 0 5 /home/esca/ESCA/model/phrase-table >>> ttable-limit: 20 >>> weight-d: 0.3 0.3 0.3 0.3 0.3 0.3 0.3 >>> weight-l: 0.5000 >>> weight-t: 0.2 0.2 0.2 0.2 0.2 >>> weight-w: -1 >>> Loading lexical distortion models... >>> have 1 models >>> Creating lexical reordering... >>> weights: 0.300 0.300 0.300 0.300 0.300 0.300 >>> binary file loaded, default OFF_T: -1 >>> Created lexical orientation reordering >>> Start loading LanguageModel /home/esca/ESCA/lm/ca.blm : [1.000] seconds >>> In LanguageModelIRST::Load: nGramOrder = 5 >>> Loading LM file (no MAP) >>> blmt >>> loadbin() >>> loading 321187 1-grams >>> loading 4548952 2-grams >>> loading 2785668 3-grams >>> loading 2501764 4-grams >>> loading 1741048 5-grams >>> done >>> OOV code is 37189 >>> IRST: m_unknownId=37189 >>> Fallo de segmentación (core dumped) #SEGMENTATION FAULT >>> >>> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >>> >>> I am using binarized phrase and reordering tables, but they worked fine >>> when I build them with my old SRILM system. >>> >>> Thanks for your help. >>> >>> Regards, >>> >>> Miguel >>> >>> Miguel José Hernández Vidal wrote: >>> >>>> >>>> Hi mailing, >>>> >>>> I am trying to build my lm with IRST toolkit. First, I've added <s> >>>> tags with 'add-start-end.sh' and, obviously, have my data tokenized & >>>> lowercased. >>>> >>>> When I run 'build-lm.sh' it looks like it works fine, but at the end >>>> of the process no output file is found. Here's the log: >>>> >>>> >>>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- >>>> >>>> [EMAIL PROTECTED]:~/irstlm/bin$ bash build-lm.sh -i ~/corpus/tag.es -o >>>> ~/corpus/ca.lm -n 3 -k 5 -s kneser-ney >>>> Cleaning temporary directory stat >>>> Extracting dictionary from training corpus >>>> Splitting dictionary into 5 lists >>>> Extracting n-gram statistics for each word list >>>> dict.000 >>>> dict.001 >>>> dict.002 >>>> dict.003 >>>> dict.004 >>>> Estimating language models for each word list >>>> dict.000 >>>> dict.001 >>>> dict.002 >>>> dict.003 >>>> dict.004 >>>> Merging language models into /home/esca/corpus/ca.lm >>>> Cleaning temporary directory stat >>>> >>>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ >>>> >>>> >>>> I've tried with different corpus sizes, but it didn't work either. >>>> btw, I am running the scripts under Ubuntu 7.04 32bit. >>>> >>>> Regards, >>>> >>>> Miguel >>>> >>>> >>> >>> _______________________________________________ >>> Moses-support mailing list >>> [email protected] >>> http://mailman.mit.edu/mailman/listinfo/moses-support >>> >>> >>> >> >> > > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
