On 02/05/2013 21:52, Zai Sarkar wrote:
Hello Hieu,
The binarize LM step of EMS still crashes.
I am using IRSTLM for the language model in EMS.
For lm-train I am using trainlm-irst2.perl as suggested.
For compiling and binarization I use :
lm-binarizer = $irstlm-dir/compile-lm
lm-binarizer = "$moses-bin-dir/build_binary -i"
type = 8
Seems like I need both lm-binarizer statements above.
The STD error I get in STEPs is shown below.
Reading /apps/moses/fr-en/lm/eu3.lm.1
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
lm/search_hashed.cc:36 in void
lm::ngram::<unnamed>::ActivateLowerMiddle<Middle>::operator()(const
lm::WordIndex*, unsigned int) [with Middle =
util::ProbingHashTable<lm::ngram::BackoffValue::ProbingEntry,
util::IdentityHash, std::equal_to<long unsigned int> >] threw
FormatLoadException'.
The context of every 4-gram should appear as a 3-gram Byte: 18657911
File: /apps/moses/fr-en/lm/eu3.lm.1
ERROR
strange, i've not seen this error before, maybe other people on the list
have.
instead of binarizing with KenLM, you can binarize with IRSTLM:
lm-binarizer = $irstlm-dir/compile-lm
type = 1
---------------------------------------
To determine the issue, I used the same lm/eu3.lm.1 result from the
compile of the EMS above, generated an ARPA file and binarized with
scripts a) and b) below, and the moses.ini from the EMS started
without error.
Make ARPA I used :
/apps/moses/mosesInstalls/irstlm/bin/compile-lm --text yes
/apps/moses/fr-en/lm/eu3.lm.1
/apps/moses/fr-en/parallel-in/training/Europarl3.fr-en.arpa.en
Binarize ARPA I used
/apps/moses/mosesInstalls/mosesdecoder/bin/build_binary -i -p 1.5
probing /apps/moses/fr-en/parallel-in/training/Europarl3.fr-en.arpa.en
/apps/moses/fr-en/lm/eu3.binlm.1
----------------------
I want to fix my EMS so binarize LM runs properly. Can you tell me why
the EMS fails to binarize the compiled LM file?
Thanks for help!
Zai
------------------------------------------------------------------------
*From:* Hieu Hoang <[email protected]>
*To:* Zai Sarkar <[email protected]>
*Cc:* "[email protected]" <[email protected]>
*Sent:* Thursday, May 2, 2013 5:33 AM
*Subject:* Re: [Moses-support] Baseline IRSTLM works but EMS IRSTLM
does not
Try using the script
trainlm-irst2.perl
Look at the config file in the example directory in Moses, or in the premade
models that you can download with v1, eg
http://www.statmt.org/moses/RELEASE-1.0/models/fr-en/config.pb
On 1 May 2013 20:59, Zai Sarkar <[email protected]
<mailto:[email protected]>> wrote:
###These Baseline commands for IRSTLM work fine using ver 1.0
Moses. A good LM file is generated:
----------Baseline script is ok
cd ../../apps/moses/mosesInstalls
export IRSTLM=/apps/moses/mosesInstalls/irstlm
#Generate the LM file
/apps/moses/mosesInstalls/irstlm/bin/add-start-end.sh <
/apps/moses/fr-en/parallel-in/training/Europarl3.fr-en.clean.en >
/apps/moses/fr-en/parallel-in/training/Europarl3.fr-en.sb.en
export IRSTLM=$HOME/mosesInstalls/irstlm;
/apps/moses/mosesInstalls/irstlm/bin/build-lm.sh -i
/apps/moses/fr-en/parallel-in/training/Europarl3.fr-en.sb.en -n 5
-k 10 -t ./moses/tmp -p -s improved-kneser-ney -o
/apps/moses/fr-en/parallel-in/training/Europarl3.fr-en.lm.en
# CREATE THE ARPA FILE
/apps/moses/mosesInstalls/irstlm/bin/compile-lm --text yes
/apps/moses/fr-en/parallel-in/training/Europarl3.fr-en.lm.en.gz
/apps/moses/fr-en/parallel-in/training/Europarl3.fr-en.arpa.en
# BINARIZE THE ARPA FILE
/apps/moses/mosesInstalls/mosesdecoder/bin/build_binary -i -p 1.5
probing
/apps/moses/fr-en/parallel-in/training/Europarl3.fr-en.arpa.en
/apps/moses/fr-en/parallel-in/training/Europarl3.fr-en.blm.en
-----------EMS script is NOT ok
###Trying to do same with EMS an LM as below, but it generates a
temporary irstlm-build-tmp directory with stat4/ dict and ngram
.gz files but no lm fie is generated after the tmp fie gets deleted.
# irstlm
lm-training = "$moses-script-dir/generic/trainlm-irst.perl -cores
$cores -irst-dir $irstlm-dir -temp-dir $working-dir/lm"
settings = ""
# order of the language model
order = 5
type = 1
### Also tried adding below to EMS, but still no LM saved
# irstlm
lm-binarizer = $irstlm-dir/compile-lm
# kenlm, also set type to 8
lm-binarizer = "$moses-bin-dir/build_binary -i"
type = 8
---------------
_______________________________________________
Moses-support mailing list
[email protected] <mailto:[email protected]>
http://mailman.mit.edu/mailman/listinfo/moses-support
--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support