It should matter --> It should not matter

On 19 March 2014 23:09, Hieu Hoang <[email protected]> wrote:

> You seem to be using the text LM. This will take a long time to load,
> especially if it's over a network. It should matter what linux distribution
> you're using.
>
> You should:
>   1. Make sure your files are on local disks
>   2. Binarize the LM with KenLM or IRSTLM. Also, binarize the phrase tables
>   3. If it's a recaser, the distortion limit [distortion-limit] should be
> 0. Otherwise the recaser can reorder the output.
>
> Also, you should consider updating your version of Moses. This will allow
> you to use IRSTLM 5.80.03. There's various changes to make it more
> extensible, faster and more reliable.
>
>
>
> On 19 March 2014 12:31, Tomas Fulajtar <[email protected]> wrote:
>
>>  Hi Hieu,
>>
>>
>>
>> Looking to log , the problem seems to be related to IRTSLM library and it
>> code inside src/lmtable.cpp (function named loadtext_ram).
>>
>>
>>
>> I have tried to return back to IRSTLM 5.80.01 and it resolved the issue
>> with long LM loading.   However as the issue might be reproducible by other
>> people,  I am wondering if we should report it to IRTSLM team  and maybe
>> add the comment to Moses wiki as well ( A see there is a comment about
>> issues with IRSTLM source code in official repos and recommended to prefer
>>  5.80.03, which unfortunately wont' work on my environment).
>>
>>
>>
>> Kind regards,
>>
>>
>>
>>
>>
>> Tomas
>>
>>
>>
>> *From:* Tomas Fulajtar
>> *Sent:* Wednesday, March 19, 2014 9:58 AM
>> *To:* 'Hieu Hoang'
>> *Cc:* [email protected]
>> *Subject:* RE: [Moses-support] Recaser - LM model loading
>>
>>
>>
>> Hi Hieu,
>>
>>
>>
>> Please find the Moses.ini attached.
>>
>>
>>
>> The LM model is  default   3-gram IRSTLM  trained by  command :
>>
>>  /opt/moses/scripts/recaser/train-recaser.perl --dir=$dir  --lm=IRSTLM
>> --build-lm=/usr/local/irstlm/bin/build-lm.sh --corpus=file
>> --train-script=/opt/moses/scripts/training/train-model.perl.
>>
>>
>>
>> I do not expect  the problem is in LM preparation steps as we are using
>> the same scripts for long time without issues.
>>
>> Parameters of trained LM:
>>
>> iARPA
>>
>>
>>
>> \data\
>>
>> ngram 1= 219165
>>
>> ngram 2= 2616463
>>
>> ngram 3= 7215865
>>
>>
>>
>>
>>
>> The command issued for the recasing experiment:
>>
>> echo 'some text to recase ' | moses -f recase/moses.ini
>>
>>
>>
>> Response on  Fedora (showing only the part with the LM  data loading) :
>>
>>
>>
>> Defined parameters (per moses.ini or switch):
>>
>>         config: moses.ini
>>
>>         distortion-limit: 6
>>
>>         input-factors: 0
>>
>>         lmodel-file: 1 0 3 /tmp/recase/cased.irstlm.gz
>>
>>         mapping: 0 T 0
>>
>>         ttable-file: 0 0 0 5 /tmp/recase/phrase-table.gz
>>
>>         ttable-limit: 20
>>
>>         weight-d: 0.6
>>
>>         weight-l: 0.5000
>>
>>         weight-t: 0.20 0.20 0.20 0.20 0.20
>>
>>         weight-w: -1
>>
>> /var/www/moses/bin
>>
>> ScoreProducer: Distortion start: 0 end: 1
>>
>> ScoreProducer: WordPenalty start: 1 end: 2
>>
>> ScoreProducer: !UnknownWordPenalty start: 2 end: 3
>>
>> Loading lexical distortion models...have 0 models
>>
>> Start loading LanguageModel /tmp/recase/cased.irstlm.gz : [0.009] seconds
>>
>> In LanguageModelIRST::Load: nGramOrder = 3
>>
>> Language Model Type of /tmp/recase/cased.irstlm.gz is 1
>>
>> Language Model Type is 1
>>
>> iARPA
>>
>> loadtxt_ram()
>>
>> 1-grams: reading 219165 entries
>>
>> done level1
>>
>> 2-grams: reading 2616463 entries
>>
>> done level2
>>
>> 3-grams: reading 7215865 entries
>>
>> .done level3
>>
>> done
>>
>> OOV code is 219164
>>
>> OOV code is 219164
>>
>> IRST: m_unknownId=219164
>>
>> ScoreProducer: LM start: 3 end: 4
>>
>> Finished loading LanguageModels : [34.666] seconds
>>
>> ...
>>
>>
>>
>> Reponse on Suse:
>>
>> Defined parameters (per moses.ini or switch):
>>
>>         config: recase/moses.ini
>>
>>         distortion-limit: 6
>>
>>         input-factors: 0
>>
>>         lmodel-file: 1 0 3 /home/sandy/retrain/recase/cased.irstlm.gz
>>
>>         mapping: 0 T 0
>>
>>         ttable-file: 0 0 0 5 /home/sandy/retrain/recase/phrase-table.gz
>>
>>         ttable-limit: 20
>>
>>         weight-d: 0.6
>>
>>         weight-l: 0.5000
>>
>>         weight-t: 0.20 0.20 0.20 0.20 0.20
>>
>>         weight-w: -1
>>
>>
>>
>> ScoreProducer: Distortion start: 0 end: 1
>>
>> ScoreProducer: WordPenalty start: 1 end: 2
>>
>> ScoreProducer: !UnknownWordPenalty start: 2 end: 3
>>
>> Loading lexical distortion models...have 0 models
>>
>> Start loading LanguageModel /home/sandy/retrain/recase/cased.irstlm.gz :
>> [0.001] seconds
>>
>> In LanguageModelIRST::Load: nGramOrder = 3
>>
>> Language Model Type of /home/sandy/retrain/recase/cased.irstlm.gz is 1
>>
>> Language Model Type is 1
>>
>> iARPA
>>
>> loadtxt_ram()
>>
>> 1-grams: reading 219165 entries
>>
>> done level 1
>>
>> 2-grams: reading 2616463 entries
>>
>> done level 2
>>
>> 3-grams: reading 7215865 entries
>>
>> .done level 3
>>
>> done
>>
>> OOV code is 219164
>>
>> OOV code is 219164
>>
>> IRST: m_unknownId=219164
>>
>> ScoreProducer: LM start: 3 end: 4
>>
>> Finished loading LanguageModels : [1045.969] seconds
>>
>> ...
>>
>>
>>
>> As you can see the loading takes enormous 1045 seconds.
>>
>>
>>
>> ---
>>
>> Meanwhile I have found there is also gcc 4.7 available in SUSE 11 SP3
>> SDK, thus I tried to recompile  boost/irstlm/moses, but the results are
>> almost same (it is faster by 200 sec due the optimization in compiler.)
>>
>>
>>
>> Thus the last config on SUSE is following:
>>
>>
>>
>> irstlm 5.80.03  - recompiled under gcc 4.7
>>
>> mgiza  0.6.3     updated to 0.7.3 and recompiled under 4.7
>>
>> boost  1.55     - recompiled under gcc 4.7
>>
>>
>>
>> I have also attached the build.log in case it would be useful.
>>
>>
>>
>> Today I am going to run regression tests to see if there are any
>> particular issues found.
>>
>>
>>
>>
>>
>> Tomas
>>
>>
>>
>>
>>
>> *From:* [email protected] [mailto:[email protected]<[email protected]>]
>> *On Behalf Of *Hieu Hoang
>> *Sent:* Wednesday, March 19, 2014 1:36 AM
>> *To:* Tomas Fulajtar
>> *Cc:* [email protected]
>> *Subject:* Re: [Moses-support] Recaser - LM model loading
>>
>>
>>
>> What is a recaser LM? What command is taking 20 minutes? Can you send me
>> the moses.ini file you're using.
>>
>>
>>
>>
>>
>> On 17 March 2014 12:58, Tomas Fulajtar <[email protected]> wrote:
>>
>> Hello,
>>
>>
>>
>> I am experiencing strange behavior when  using recaser  LM model  after
>> migrated to moses(1.0) compiled on different machine.
>>
>> The problem is that loading of LM takes  20 minutes on my new machine
>> (SUSE), while on previous it was 20 secs or so.
>>
>>
>>
>> Machine 1: Fedora 18:
>>
>> ·         gcc: 4.7.2
>>
>> ·         perl 5.16
>>
>> ·         moses  1.0
>>
>> ·         irstlm 5.80.01
>>
>> ·         mgiza  0.7.0
>>
>> ·         boost  1.52
>>
>>
>>
>> Machine 2: SUSE  SLES  11 SP3
>>
>>
>>
>> ·         perl: 5.10.0
>>
>> ·         gcc: 4.3
>>
>> ·         moses  1.0
>>
>> ·         irstlm 5.80.03
>>
>> ·         mgiza  0.6.3
>>
>> ·         boost  1.55
>>
>>
>>
>> Moses compilation command:
>>
>>
>>
>> sudo ./bjam --prefix=/opt/moses --install-scripts=/opt/moses/scripts -j4
>> -a --with-irstlm=/usr/local/irstlm --with-xmlrpc-c=/usr/local
>> --with-cmph=usr/local --with-boost=/opt/boost --with-giza=/usr/local/bin
>> --enable-boost-pool --enable-optimization  --debug-symbols=off toolset=gcc
>> -d2 --debug-configuration --max-kenlm-order=7 |tee ~/build.log 2>&1
>>
>>
>>
>> I have tested the speed using the same recaser  IRSTLM model data in ARPA
>> format . There is actually no error displayed, thus I wonder where to
>> continue with debugging. Also tried to retrain model on SUSE  and then test
>> on Fedora, but the result is same (no error, but too slow on SUSE). Does
>> anybody have idea where to look for resolution? Maybe the problem is in
>> IRSTLM used?
>>
>>
>>
>>
>>
>> Thank you,
>>
>>
>>
>> Tomas Fulajtar
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>>
>>
>> --
>> Hieu Hoang
>> Research Associate
>> University of Edinburgh
>> http://www.hoang.co.uk/hieu
>>
>
>
>
> --
> Hieu Hoang
> Research Associate
> University of Edinburgh
> http://www.hoang.co.uk/hieu
>
>


-- 
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to