Hi Hieu,

Looking to log , the problem seems to be related to IRTSLM library and it code 
inside src/lmtable.cpp (function named loadtext_ram).

I have tried to return back to IRSTLM 5.80.01 and it resolved the issue with 
long LM loading.   However as the issue might be reproducible by other people,  
I am wondering if we should report it to IRTSLM team  and maybe add the comment 
to Moses wiki as well ( A see there is a comment about issues with IRSTLM 
source code in official repos and recommended to prefer  5.80.03, which 
unfortunately wont' work on my environment).

Kind regards,


Tomas

From: Tomas Fulajtar
Sent: Wednesday, March 19, 2014 9:58 AM
To: 'Hieu Hoang'
Cc: [email protected]
Subject: RE: [Moses-support] Recaser - LM model loading

Hi Hieu,

Please find the Moses.ini attached.

The LM model is  default   3-gram IRSTLM  trained by  command :
 /opt/moses/scripts/recaser/train-recaser.perl --dir=$dir  --lm=IRSTLM 
--build-lm=/usr/local/irstlm/bin/build-lm.sh --corpus=file 
--train-script=/opt/moses/scripts/training/train-model.perl.

I do not expect  the problem is in LM preparation steps as we are using the 
same scripts for long time without issues.
Parameters of trained LM:
iARPA

\data\
ngram 1= 219165
ngram 2= 2616463
ngram 3= 7215865


The command issued for the recasing experiment:
echo 'some text to recase ' | moses -f recase/moses.ini

Response on  Fedora (showing only the part with the LM  data loading) :

Defined parameters (per moses.ini or switch):
        config: moses.ini
        distortion-limit: 6
        input-factors: 0
        lmodel-file: 1 0 3 /tmp/recase/cased.irstlm.gz
        mapping: 0 T 0
        ttable-file: 0 0 0 5 /tmp/recase/phrase-table.gz
        ttable-limit: 20
        weight-d: 0.6
        weight-l: 0.5000
        weight-t: 0.20 0.20 0.20 0.20 0.20
        weight-w: -1
/var/www/moses/bin
ScoreProducer: Distortion start: 0 end: 1
ScoreProducer: WordPenalty start: 1 end: 2
ScoreProducer: !UnknownWordPenalty start: 2 end: 3
Loading lexical distortion models...have 0 models
Start loading LanguageModel /tmp/recase/cased.irstlm.gz : [0.009] seconds
In LanguageModelIRST::Load: nGramOrder = 3
Language Model Type of /tmp/recase/cased.irstlm.gz is 1
Language Model Type is 1
iARPA
loadtxt_ram()
1-grams: reading 219165 entries
done level1
2-grams: reading 2616463 entries
done level2
3-grams: reading 7215865 entries
.done level3
done
OOV code is 219164
OOV code is 219164
IRST: m_unknownId=219164
ScoreProducer: LM start: 3 end: 4
Finished loading LanguageModels : [34.666] seconds
...

Reponse on Suse:
Defined parameters (per moses.ini or switch):
        config: recase/moses.ini
        distortion-limit: 6
        input-factors: 0
        lmodel-file: 1 0 3 /home/sandy/retrain/recase/cased.irstlm.gz
        mapping: 0 T 0
        ttable-file: 0 0 0 5 /home/sandy/retrain/recase/phrase-table.gz
        ttable-limit: 20
        weight-d: 0.6
        weight-l: 0.5000
        weight-t: 0.20 0.20 0.20 0.20 0.20
        weight-w: -1

ScoreProducer: Distortion start: 0 end: 1
ScoreProducer: WordPenalty start: 1 end: 2
ScoreProducer: !UnknownWordPenalty start: 2 end: 3
Loading lexical distortion models...have 0 models
Start loading LanguageModel /home/sandy/retrain/recase/cased.irstlm.gz : 
[0.001] seconds
In LanguageModelIRST::Load: nGramOrder = 3
Language Model Type of /home/sandy/retrain/recase/cased.irstlm.gz is 1
Language Model Type is 1
iARPA
loadtxt_ram()
1-grams: reading 219165 entries
done level 1
2-grams: reading 2616463 entries
done level 2
3-grams: reading 7215865 entries
.done level 3
done
OOV code is 219164
OOV code is 219164
IRST: m_unknownId=219164
ScoreProducer: LM start: 3 end: 4
Finished loading LanguageModels : [1045.969] seconds
...

As you can see the loading takes enormous 1045 seconds.

---
Meanwhile I have found there is also gcc 4.7 available in SUSE 11 SP3 SDK, thus 
I tried to recompile  boost/irstlm/moses, but the results are almost same (it 
is faster by 200 sec due the optimization in compiler.)

Thus the last config on SUSE is following:

irstlm 5.80.03  - recompiled under gcc 4.7
mgiza  0.6.3     updated to 0.7.3 and recompiled under 4.7
boost  1.55     - recompiled under gcc 4.7

I have also attached the build.log in case it would be useful.

Today I am going to run regression tests to see if there are any particular 
issues found.


Tomas


From: [email protected]<mailto:[email protected]> 
[mailto:[email protected]] On Behalf Of Hieu Hoang
Sent: Wednesday, March 19, 2014 1:36 AM
To: Tomas Fulajtar
Cc: [email protected]<mailto:[email protected]>
Subject: Re: [Moses-support] Recaser - LM model loading

What is a recaser LM? What command is taking 20 minutes? Can you send me the 
moses.ini file you're using.


On 17 March 2014 12:58, Tomas Fulajtar 
<[email protected]<mailto:[email protected]>> wrote:
Hello,

I am experiencing strange behavior when  using recaser  LM model  after 
migrated to moses(1.0) compiled on different machine.
The problem is that loading of LM takes  20 minutes on my new machine (SUSE), 
while on previous it was 20 secs or so.

Machine 1: Fedora 18:

*         gcc: 4.7.2

*         perl 5.16

*         moses  1.0

*         irstlm 5.80.01

*         mgiza  0.7.0

*         boost  1.52

Machine 2: SUSE  SLES  11 SP3


*         perl: 5.10.0

*         gcc: 4.3

*         moses  1.0

*         irstlm 5.80.03

*         mgiza  0.6.3

*         boost  1.55

Moses compilation command:

sudo ./bjam --prefix=/opt/moses --install-scripts=/opt/moses/scripts -j4 -a 
--with-irstlm=/usr/local/irstlm --with-xmlrpc-c=/usr/local 
--with-cmph=usr/local --with-boost=/opt/boost --with-giza=/usr/local/bin 
--enable-boost-pool --enable-optimization  --debug-symbols=off toolset=gcc -d2 
--debug-configuration --max-kenlm-order=7 |tee ~/build.log 2>&1

I have tested the speed using the same recaser  IRSTLM model data in ARPA 
format . There is actually no error displayed, thus I wonder where to continue 
with debugging. Also tried to retrain model on SUSE  and then test on Fedora, 
but the result is same (no error, but too slow on SUSE). Does anybody have idea 
where to look for resolution? Maybe the problem is in IRSTLM used?


Thank you,

Tomas Fulajtar



_______________________________________________
Moses-support mailing list
[email protected]<mailto:[email protected]>
http://mailman.mit.edu/mailman/listinfo/moses-support



--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to