It should matter --> It should not matter
On 19 March 2014 23:09, Hieu Hoang <[email protected]> wrote: > You seem to be using the text LM. This will take a long time to load, > especially if it's over a network. It should matter what linux distribution > you're using. > > You should: > 1. Make sure your files are on local disks > 2. Binarize the LM with KenLM or IRSTLM. Also, binarize the phrase tables > 3. If it's a recaser, the distortion limit [distortion-limit] should be > 0. Otherwise the recaser can reorder the output. > > Also, you should consider updating your version of Moses. This will allow > you to use IRSTLM 5.80.03. There's various changes to make it more > extensible, faster and more reliable. > > > > On 19 March 2014 12:31, Tomas Fulajtar <[email protected]> wrote: > >> Hi Hieu, >> >> >> >> Looking to log , the problem seems to be related to IRTSLM library and it >> code inside src/lmtable.cpp (function named loadtext_ram). >> >> >> >> I have tried to return back to IRSTLM 5.80.01 and it resolved the issue >> with long LM loading. However as the issue might be reproducible by other >> people, I am wondering if we should report it to IRTSLM team and maybe >> add the comment to Moses wiki as well ( A see there is a comment about >> issues with IRSTLM source code in official repos and recommended to prefer >> 5.80.03, which unfortunately wont' work on my environment). >> >> >> >> Kind regards, >> >> >> >> >> >> Tomas >> >> >> >> *From:* Tomas Fulajtar >> *Sent:* Wednesday, March 19, 2014 9:58 AM >> *To:* 'Hieu Hoang' >> *Cc:* [email protected] >> *Subject:* RE: [Moses-support] Recaser - LM model loading >> >> >> >> Hi Hieu, >> >> >> >> Please find the Moses.ini attached. >> >> >> >> The LM model is default 3-gram IRSTLM trained by command : >> >> /opt/moses/scripts/recaser/train-recaser.perl --dir=$dir --lm=IRSTLM >> --build-lm=/usr/local/irstlm/bin/build-lm.sh --corpus=file >> --train-script=/opt/moses/scripts/training/train-model.perl. >> >> >> >> I do not expect the problem is in LM preparation steps as we are using >> the same scripts for long time without issues. >> >> Parameters of trained LM: >> >> iARPA >> >> >> >> \data\ >> >> ngram 1= 219165 >> >> ngram 2= 2616463 >> >> ngram 3= 7215865 >> >> >> >> >> >> The command issued for the recasing experiment: >> >> echo 'some text to recase ' | moses -f recase/moses.ini >> >> >> >> Response on Fedora (showing only the part with the LM data loading) : >> >> >> >> Defined parameters (per moses.ini or switch): >> >> config: moses.ini >> >> distortion-limit: 6 >> >> input-factors: 0 >> >> lmodel-file: 1 0 3 /tmp/recase/cased.irstlm.gz >> >> mapping: 0 T 0 >> >> ttable-file: 0 0 0 5 /tmp/recase/phrase-table.gz >> >> ttable-limit: 20 >> >> weight-d: 0.6 >> >> weight-l: 0.5000 >> >> weight-t: 0.20 0.20 0.20 0.20 0.20 >> >> weight-w: -1 >> >> /var/www/moses/bin >> >> ScoreProducer: Distortion start: 0 end: 1 >> >> ScoreProducer: WordPenalty start: 1 end: 2 >> >> ScoreProducer: !UnknownWordPenalty start: 2 end: 3 >> >> Loading lexical distortion models...have 0 models >> >> Start loading LanguageModel /tmp/recase/cased.irstlm.gz : [0.009] seconds >> >> In LanguageModelIRST::Load: nGramOrder = 3 >> >> Language Model Type of /tmp/recase/cased.irstlm.gz is 1 >> >> Language Model Type is 1 >> >> iARPA >> >> loadtxt_ram() >> >> 1-grams: reading 219165 entries >> >> done level1 >> >> 2-grams: reading 2616463 entries >> >> done level2 >> >> 3-grams: reading 7215865 entries >> >> .done level3 >> >> done >> >> OOV code is 219164 >> >> OOV code is 219164 >> >> IRST: m_unknownId=219164 >> >> ScoreProducer: LM start: 3 end: 4 >> >> Finished loading LanguageModels : [34.666] seconds >> >> ... >> >> >> >> Reponse on Suse: >> >> Defined parameters (per moses.ini or switch): >> >> config: recase/moses.ini >> >> distortion-limit: 6 >> >> input-factors: 0 >> >> lmodel-file: 1 0 3 /home/sandy/retrain/recase/cased.irstlm.gz >> >> mapping: 0 T 0 >> >> ttable-file: 0 0 0 5 /home/sandy/retrain/recase/phrase-table.gz >> >> ttable-limit: 20 >> >> weight-d: 0.6 >> >> weight-l: 0.5000 >> >> weight-t: 0.20 0.20 0.20 0.20 0.20 >> >> weight-w: -1 >> >> >> >> ScoreProducer: Distortion start: 0 end: 1 >> >> ScoreProducer: WordPenalty start: 1 end: 2 >> >> ScoreProducer: !UnknownWordPenalty start: 2 end: 3 >> >> Loading lexical distortion models...have 0 models >> >> Start loading LanguageModel /home/sandy/retrain/recase/cased.irstlm.gz : >> [0.001] seconds >> >> In LanguageModelIRST::Load: nGramOrder = 3 >> >> Language Model Type of /home/sandy/retrain/recase/cased.irstlm.gz is 1 >> >> Language Model Type is 1 >> >> iARPA >> >> loadtxt_ram() >> >> 1-grams: reading 219165 entries >> >> done level 1 >> >> 2-grams: reading 2616463 entries >> >> done level 2 >> >> 3-grams: reading 7215865 entries >> >> .done level 3 >> >> done >> >> OOV code is 219164 >> >> OOV code is 219164 >> >> IRST: m_unknownId=219164 >> >> ScoreProducer: LM start: 3 end: 4 >> >> Finished loading LanguageModels : [1045.969] seconds >> >> ... >> >> >> >> As you can see the loading takes enormous 1045 seconds. >> >> >> >> --- >> >> Meanwhile I have found there is also gcc 4.7 available in SUSE 11 SP3 >> SDK, thus I tried to recompile boost/irstlm/moses, but the results are >> almost same (it is faster by 200 sec due the optimization in compiler.) >> >> >> >> Thus the last config on SUSE is following: >> >> >> >> irstlm 5.80.03 - recompiled under gcc 4.7 >> >> mgiza 0.6.3 updated to 0.7.3 and recompiled under 4.7 >> >> boost 1.55 - recompiled under gcc 4.7 >> >> >> >> I have also attached the build.log in case it would be useful. >> >> >> >> Today I am going to run regression tests to see if there are any >> particular issues found. >> >> >> >> >> >> Tomas >> >> >> >> >> >> *From:* [email protected] [mailto:[email protected]<[email protected]>] >> *On Behalf Of *Hieu Hoang >> *Sent:* Wednesday, March 19, 2014 1:36 AM >> *To:* Tomas Fulajtar >> *Cc:* [email protected] >> *Subject:* Re: [Moses-support] Recaser - LM model loading >> >> >> >> What is a recaser LM? What command is taking 20 minutes? Can you send me >> the moses.ini file you're using. >> >> >> >> >> >> On 17 March 2014 12:58, Tomas Fulajtar <[email protected]> wrote: >> >> Hello, >> >> >> >> I am experiencing strange behavior when using recaser LM model after >> migrated to moses(1.0) compiled on different machine. >> >> The problem is that loading of LM takes 20 minutes on my new machine >> (SUSE), while on previous it was 20 secs or so. >> >> >> >> Machine 1: Fedora 18: >> >> · gcc: 4.7.2 >> >> · perl 5.16 >> >> · moses 1.0 >> >> · irstlm 5.80.01 >> >> · mgiza 0.7.0 >> >> · boost 1.52 >> >> >> >> Machine 2: SUSE SLES 11 SP3 >> >> >> >> · perl: 5.10.0 >> >> · gcc: 4.3 >> >> · moses 1.0 >> >> · irstlm 5.80.03 >> >> · mgiza 0.6.3 >> >> · boost 1.55 >> >> >> >> Moses compilation command: >> >> >> >> sudo ./bjam --prefix=/opt/moses --install-scripts=/opt/moses/scripts -j4 >> -a --with-irstlm=/usr/local/irstlm --with-xmlrpc-c=/usr/local >> --with-cmph=usr/local --with-boost=/opt/boost --with-giza=/usr/local/bin >> --enable-boost-pool --enable-optimization --debug-symbols=off toolset=gcc >> -d2 --debug-configuration --max-kenlm-order=7 |tee ~/build.log 2>&1 >> >> >> >> I have tested the speed using the same recaser IRSTLM model data in ARPA >> format . There is actually no error displayed, thus I wonder where to >> continue with debugging. Also tried to retrain model on SUSE and then test >> on Fedora, but the result is same (no error, but too slow on SUSE). Does >> anybody have idea where to look for resolution? Maybe the problem is in >> IRSTLM used? >> >> >> >> >> >> Thank you, >> >> >> >> Tomas Fulajtar >> >> >> >> >> >> >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support >> >> >> >> >> -- >> Hieu Hoang >> Research Associate >> University of Edinburgh >> http://www.hoang.co.uk/hieu >> > > > > -- > Hieu Hoang > Research Associate > University of Edinburgh > http://www.hoang.co.uk/hieu > > -- Hieu Hoang Research Associate University of Edinburgh http://www.hoang.co.uk/hieu
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
