Fatal errors during step 2 are normally traceable to poor corpus preparation. Termination, however, does not always happen immediately. Look through the entire log output. You'll probably find one of these errors:

"WARNING: The following sentence pair has source/target sentence length ration more than"

or "ERROR: Forbidden zero sentence length 0"

or a line beginning with "ERROR:"

The fact that your corpus has placeholders makes me suspect you probably have a bad ratio.




On 02/20/2015 06:18 PM, Vito Mandorino wrote:
Dear All,

I am training a model with placeholders from French to English and the process ends before the end of training step 2, without creating the en-fr.AR.final.gz and fr-en.AR.final.gz files in the respective folders giza.en-fr and giza.fr-en .

I cannot understand why. Here's the last 7 lines of the training.out file:


    Entire Viterbi H333444 Training took: 78424 seconds
    ==========================================================

    Entire Training took: 134751 seconds
    Program Finished at: Fri Feb 20 05:45:19 2015

    ==========================================================


and here's the command used for training:

    nohup nice
    /home/Moses/mosesdecoder/scripts/training/train-model.perl \
    --parallel \
    -mgiza -mgiza-cpus 20 \
    -root-dir /home/BRIQUES/train-leclerc-light-ph-fren/training \
    -external-bin-dir /root/external-bin-dir/ \
    -corpus
    /home/BRIQUES/train-leclerc-light-ph-enfr/data/Corpus.leclerc-ph.cleanclean
    -f fr -e en \
    -alignment grow-diag-final-and -reordering msd-bidirectional-fe \
    -lm 0:5:/home/DATA/LM/RapAnSoc/lm.RapAnSoc5-ph.blm.en.mm:8
    <http://lm.RapAnSoc5-ph.blm.en.mm:8> \
    -write-lexical-counts \
    -extract-options '--Placeholders @num@,@rom@,@alpha@' \
    >& /home/BRIQUES/train-leclerc-light-ph-fren/training/training.out &


It seems a bit odd to me that the analogous training in the reversed English-French direction has worked nicely. In fact, in this case the folders giza.en-fr and giza.fr-en do contain not only the .gz files but also two more "chunks" en-fr.A3.final.partD and en-fr.A3.final.partE, whereas the French-English training stops at fr-en.A3.final.partC.


Thank you,

Vito Mandorino

--

Description : Description : lingua_custodia_final full logo

*/The Translation Trustee/*

*1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*

*Tel : +33 1 30 44 04 23  Mobile : +33 6 84 65 68 89*

*Email :****[email protected] <mailto:[email protected]>***

*Website :****www.linguacustodia.com <http://www.linguacustodia.com/> - www.thetranslationtrustee.com <http://www.thetranslationtrustee.com/>*



_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to