On 02/05/2013 21:52, Zai Sarkar wrote:
Hello Hieu,
The binarize LM step of EMS still crashes.
I am using IRSTLM for the language model in EMS.
For lm-train I am using trainlm-irst2.perl as suggested.
For compiling and binarization I use :
lm-binarizer = $irstlm-dir/compile-lm
lm-binarizer = "$moses-bin-dir/build_binary -i"
type = 8

Seems like I need both lm-binarizer statements above.
The STD error I get in STEPs is shown  below.

Reading /apps/moses/fr-en/lm/eu3.lm.1
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
lm/search_hashed.cc:36 in void lm::ngram::<unnamed>::ActivateLowerMiddle<Middle>::operator()(const lm::WordIndex*, unsigned int) [with Middle = util::ProbingHashTable<lm::ngram::BackoffValue::ProbingEntry, util::IdentityHash, std::equal_to<long unsigned int> >] threw FormatLoadException'. The context of every 4-gram should appear as a 3-gram Byte: 18657911 File: /apps/moses/fr-en/lm/eu3.lm.1
ERROR
strange, i've not seen this error before, maybe other people on the list have.

instead of binarizing with KenLM, you can binarize with IRSTLM:
  lm-binarizer = $irstlm-dir/compile-lm
  type = 1


---------------------------------------
To determine the issue, I used the same lm/eu3.lm.1 result from the compile of the EMS above, generated an ARPA file and binarized with scripts a) and b) below, and the moses.ini from the EMS started without error.
Make ARPA I used :
/apps/moses/mosesInstalls/irstlm/bin/compile-lm --text yes /apps/moses/fr-en/lm/eu3.lm.1 /apps/moses/fr-en/parallel-in/training/Europarl3.fr-en.arpa.en
Binarize ARPA I used
/apps/moses/mosesInstalls/mosesdecoder/bin/build_binary -i -p 1.5 probing /apps/moses/fr-en/parallel-in/training/Europarl3.fr-en.arpa.en /apps/moses/fr-en/lm/eu3.binlm.1
----------------------

I want to fix my EMS so binarize LM runs properly. Can you tell me why the EMS fails to binarize the compiled LM file?
Thanks for help!
Zai
------------------------------------------------------------------------
*From:* Hieu Hoang <[email protected]>
*To:* Zai Sarkar <[email protected]>
*Cc:* "[email protected]" <[email protected]>
*Sent:* Thursday, May 2, 2013 5:33 AM
*Subject:* Re: [Moses-support] Baseline IRSTLM works but EMS IRSTLM does not

Try using the script
    trainlm-irst2.perl
Look at the config file in the example directory in Moses, or in the premade 
models that you can download with v1, eg
    http://www.statmt.org/moses/RELEASE-1.0/models/fr-en/config.pb



On 1 May 2013 20:59, Zai Sarkar <[email protected] <mailto:[email protected]>> wrote:

    ###These Baseline commands for IRSTLM work fine using ver 1.0
    Moses. A good LM file is generated:
    ----------Baseline script is ok
    cd ../../apps/moses/mosesInstalls
    export IRSTLM=/apps/moses/mosesInstalls/irstlm
    #Generate the LM file
    /apps/moses/mosesInstalls/irstlm/bin/add-start-end.sh <
    /apps/moses/fr-en/parallel-in/training/Europarl3.fr-en.clean.en >
    /apps/moses/fr-en/parallel-in/training/Europarl3.fr-en.sb.en
    export IRSTLM=$HOME/mosesInstalls/irstlm;
    /apps/moses/mosesInstalls/irstlm/bin/build-lm.sh -i
    /apps/moses/fr-en/parallel-in/training/Europarl3.fr-en.sb.en -n 5
    -k 10 -t ./moses/tmp -p -s improved-kneser-ney -o
    /apps/moses/fr-en/parallel-in/training/Europarl3.fr-en.lm.en
    # CREATE THE ARPA FILE
    /apps/moses/mosesInstalls/irstlm/bin/compile-lm --text yes
    /apps/moses/fr-en/parallel-in/training/Europarl3.fr-en.lm.en.gz
    /apps/moses/fr-en/parallel-in/training/Europarl3.fr-en.arpa.en
    #  BINARIZE THE ARPA FILE
    /apps/moses/mosesInstalls/mosesdecoder/bin/build_binary -i -p 1.5
    probing
    /apps/moses/fr-en/parallel-in/training/Europarl3.fr-en.arpa.en
    /apps/moses/fr-en/parallel-in/training/Europarl3.fr-en.blm.en

    -----------EMS script is NOT ok
    ###Trying to do same with EMS an LM as below, but it generates a
    temporary irstlm-build-tmp directory with stat4/ dict and ngram
    .gz files but no lm fie is generated after the tmp fie gets deleted.
    # irstlm
    lm-training = "$moses-script-dir/generic/trainlm-irst.perl -cores
    $cores -irst-dir $irstlm-dir -temp-dir $working-dir/lm"
    settings = ""
    # order of the language model
    order = 5
    type = 1

    ###  Also tried adding below to EMS, but still no LM saved
    # irstlm
    lm-binarizer = $irstlm-dir/compile-lm
    # kenlm, also set type to 8
    lm-binarizer = "$moses-bin-dir/build_binary -i"
    type = 8
    ---------------

    _______________________________________________
    Moses-support mailing list
    [email protected] <mailto:[email protected]>
    http://mailman.mit.edu/mailman/listinfo/moses-support




--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu




_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to