Hi Hieu, Looking to log , the problem seems to be related to IRTSLM library and it code inside src/lmtable.cpp (function named loadtext_ram).
I have tried to return back to IRSTLM 5.80.01 and it resolved the issue with long LM loading. However as the issue might be reproducible by other people, I am wondering if we should report it to IRTSLM team and maybe add the comment to Moses wiki as well ( A see there is a comment about issues with IRSTLM source code in official repos and recommended to prefer 5.80.03, which unfortunately wont' work on my environment). Kind regards, Tomas From: Tomas Fulajtar Sent: Wednesday, March 19, 2014 9:58 AM To: 'Hieu Hoang' Cc: [email protected] Subject: RE: [Moses-support] Recaser - LM model loading Hi Hieu, Please find the Moses.ini attached. The LM model is default 3-gram IRSTLM trained by command : /opt/moses/scripts/recaser/train-recaser.perl --dir=$dir --lm=IRSTLM --build-lm=/usr/local/irstlm/bin/build-lm.sh --corpus=file --train-script=/opt/moses/scripts/training/train-model.perl. I do not expect the problem is in LM preparation steps as we are using the same scripts for long time without issues. Parameters of trained LM: iARPA \data\ ngram 1= 219165 ngram 2= 2616463 ngram 3= 7215865 The command issued for the recasing experiment: echo 'some text to recase ' | moses -f recase/moses.ini Response on Fedora (showing only the part with the LM data loading) : Defined parameters (per moses.ini or switch): config: moses.ini distortion-limit: 6 input-factors: 0 lmodel-file: 1 0 3 /tmp/recase/cased.irstlm.gz mapping: 0 T 0 ttable-file: 0 0 0 5 /tmp/recase/phrase-table.gz ttable-limit: 20 weight-d: 0.6 weight-l: 0.5000 weight-t: 0.20 0.20 0.20 0.20 0.20 weight-w: -1 /var/www/moses/bin ScoreProducer: Distortion start: 0 end: 1 ScoreProducer: WordPenalty start: 1 end: 2 ScoreProducer: !UnknownWordPenalty start: 2 end: 3 Loading lexical distortion models...have 0 models Start loading LanguageModel /tmp/recase/cased.irstlm.gz : [0.009] seconds In LanguageModelIRST::Load: nGramOrder = 3 Language Model Type of /tmp/recase/cased.irstlm.gz is 1 Language Model Type is 1 iARPA loadtxt_ram() 1-grams: reading 219165 entries done level1 2-grams: reading 2616463 entries done level2 3-grams: reading 7215865 entries .done level3 done OOV code is 219164 OOV code is 219164 IRST: m_unknownId=219164 ScoreProducer: LM start: 3 end: 4 Finished loading LanguageModels : [34.666] seconds ... Reponse on Suse: Defined parameters (per moses.ini or switch): config: recase/moses.ini distortion-limit: 6 input-factors: 0 lmodel-file: 1 0 3 /home/sandy/retrain/recase/cased.irstlm.gz mapping: 0 T 0 ttable-file: 0 0 0 5 /home/sandy/retrain/recase/phrase-table.gz ttable-limit: 20 weight-d: 0.6 weight-l: 0.5000 weight-t: 0.20 0.20 0.20 0.20 0.20 weight-w: -1 ScoreProducer: Distortion start: 0 end: 1 ScoreProducer: WordPenalty start: 1 end: 2 ScoreProducer: !UnknownWordPenalty start: 2 end: 3 Loading lexical distortion models...have 0 models Start loading LanguageModel /home/sandy/retrain/recase/cased.irstlm.gz : [0.001] seconds In LanguageModelIRST::Load: nGramOrder = 3 Language Model Type of /home/sandy/retrain/recase/cased.irstlm.gz is 1 Language Model Type is 1 iARPA loadtxt_ram() 1-grams: reading 219165 entries done level 1 2-grams: reading 2616463 entries done level 2 3-grams: reading 7215865 entries .done level 3 done OOV code is 219164 OOV code is 219164 IRST: m_unknownId=219164 ScoreProducer: LM start: 3 end: 4 Finished loading LanguageModels : [1045.969] seconds ... As you can see the loading takes enormous 1045 seconds. --- Meanwhile I have found there is also gcc 4.7 available in SUSE 11 SP3 SDK, thus I tried to recompile boost/irstlm/moses, but the results are almost same (it is faster by 200 sec due the optimization in compiler.) Thus the last config on SUSE is following: irstlm 5.80.03 - recompiled under gcc 4.7 mgiza 0.6.3 updated to 0.7.3 and recompiled under 4.7 boost 1.55 - recompiled under gcc 4.7 I have also attached the build.log in case it would be useful. Today I am going to run regression tests to see if there are any particular issues found. Tomas From: [email protected]<mailto:[email protected]> [mailto:[email protected]] On Behalf Of Hieu Hoang Sent: Wednesday, March 19, 2014 1:36 AM To: Tomas Fulajtar Cc: [email protected]<mailto:[email protected]> Subject: Re: [Moses-support] Recaser - LM model loading What is a recaser LM? What command is taking 20 minutes? Can you send me the moses.ini file you're using. On 17 March 2014 12:58, Tomas Fulajtar <[email protected]<mailto:[email protected]>> wrote: Hello, I am experiencing strange behavior when using recaser LM model after migrated to moses(1.0) compiled on different machine. The problem is that loading of LM takes 20 minutes on my new machine (SUSE), while on previous it was 20 secs or so. Machine 1: Fedora 18: * gcc: 4.7.2 * perl 5.16 * moses 1.0 * irstlm 5.80.01 * mgiza 0.7.0 * boost 1.52 Machine 2: SUSE SLES 11 SP3 * perl: 5.10.0 * gcc: 4.3 * moses 1.0 * irstlm 5.80.03 * mgiza 0.6.3 * boost 1.55 Moses compilation command: sudo ./bjam --prefix=/opt/moses --install-scripts=/opt/moses/scripts -j4 -a --with-irstlm=/usr/local/irstlm --with-xmlrpc-c=/usr/local --with-cmph=usr/local --with-boost=/opt/boost --with-giza=/usr/local/bin --enable-boost-pool --enable-optimization --debug-symbols=off toolset=gcc -d2 --debug-configuration --max-kenlm-order=7 |tee ~/build.log 2>&1 I have tested the speed using the same recaser IRSTLM model data in ARPA format . There is actually no error displayed, thus I wonder where to continue with debugging. Also tried to retrain model on SUSE and then test on Fedora, but the result is same (no error, but too slow on SUSE). Does anybody have idea where to look for resolution? Maybe the problem is in IRSTLM used? Thank you, Tomas Fulajtar _______________________________________________ Moses-support mailing list [email protected]<mailto:[email protected]> http://mailman.mit.edu/mailman/listinfo/moses-support -- Hieu Hoang Research Associate University of Edinburgh http://www.hoang.co.uk/hieu
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
