My experience is these warnings are non-fatal. I.e. they do not typically cause missing .gz word alignment files. Rather, they indicate a high degree of complexity in the parallel corpus.

Are you still getting zero-length (20 bytes) .gz alignment files?


On 02/24/2015 04:35 PM, Vito Mandorino wrote:
Thank you, I did have indeed a bad ratio in a previous try but not in this one. I have launched the clean-corpus.perl script right before the training.
In the log output there are actually several lines with warnings such as:

PROBLEM: alignment is 0.
WARNING: Hill Climbing yielded a zero score viterbi alignment for the following pair:
WARNING: Model2 viterbi alignment has zero score.
Fert[50] selected WARNING: Model2 viterbi alignment has zero score.
WARNING: already 41 iterations in hillclimb: 3.24683 2 26 29
WARNING: DIFFERENT SUMS: (1) (3.13379)

Vito


2015-02-20 19:18 GMT+01:00 Tom Hoar <[email protected] <mailto:[email protected]>>:

    Fatal errors during step 2 are normally traceable to poor corpus
    preparation. Termination, however, does not always happen
    immediately. Look through the entire log output. You'll probably
    find one of these errors:

    "WARNING: The following sentence pair has source/target sentence
    length ration more than"

    or "ERROR: Forbidden zero sentence length 0"

    or a line beginning with "ERROR:"

    The fact that your corpus has placeholders makes me suspect you
    probably have a bad ratio.





    On 02/20/2015 06:18 PM, Vito Mandorino wrote:
    Dear All,

    I am training a model with placeholders from French to English
    and the process ends before the end of training step 2, without
    creating the en-fr.AR.final.gz  and fr-en.AR.final.gz files in
    the respective folders giza.en-fr and giza.fr-en .

    I cannot understand why. Here's the last 7 lines of the
    training.out file:


        Entire Viterbi H333444 Training took: 78424 seconds
        ==========================================================

        Entire Training took: 134751 seconds
        Program Finished at: Fri Feb 20 05:45:19 2015

        ==========================================================


    and here's the command used for training:

        nohup nice
        /home/Moses/mosesdecoder/scripts/training/train-model.perl \
        --parallel \
        -mgiza -mgiza-cpus 20 \
        -root-dir /home/BRIQUES/train-leclerc-light-ph-fren/training \
        -external-bin-dir /root/external-bin-dir/ \
        -corpus
        
/home/BRIQUES/train-leclerc-light-ph-enfr/data/Corpus.leclerc-ph.cleanclean
        -f fr -e en \
        -alignment grow-diag-final-and -reordering msd-bidirectional-fe \
        -lm 0:5:/home/DATA/LM/RapAnSoc/lm.RapAnSoc5-ph.blm.en.mm:8
        <http://lm.RapAnSoc5-ph.blm.en.mm:8> \
        -write-lexical-counts \
        -extract-options '--Placeholders @num@,@rom@,@alpha@' \
        >&
        /home/BRIQUES/train-leclerc-light-ph-fren/training/training.out &


    It seems a bit odd to me that the analogous training in the
    reversed English-French direction has worked nicely.
    In fact, in this case the folders giza.en-fr and giza.fr-en do
    contain not only the .gz files but also two more "chunks"
    en-fr.A3.final.partD and en-fr.A3.final.partE, whereas the
    French-English training stops at fr-en.A3.final.partC.


    Thank you,

    Vito Mandorino

--
    Description : Description : lingua_custodia_final full logo

    */The Translation Trustee/*

    *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*

    *Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89
    <tel:%2B33%206%2084%2065%2068%2089>*

    *Email :****[email protected]
    <mailto:[email protected]>***

    *Website :****www.linguacustodia.com
    <http://www.linguacustodia.com/> - www.thetranslationtrustee.com
    <http://www.thetranslationtrustee.com/>*



    _______________________________________________
    Moses-support mailing list
    [email protected]  <mailto:[email protected]>
    http://mailman.mit.edu/mailman/listinfo/moses-support


    _______________________________________________
    Moses-support mailing list
    [email protected] <mailto:[email protected]>
    http://mailman.mit.edu/mailman/listinfo/moses-support




--
*M**. Vito MANDORINO -- Chief Scientist*

Description : Description : lingua_custodia_final full logo

*/The Translation Trustee/*

*1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*

*Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89*

*Email :****[email protected] <mailto:[email protected]>***

*Website :****www.linguacustodia.com <http://www.linguacustodia.com/> - www.thetranslationtrustee.com <http://www.thetranslationtrustee.com/>*



_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to