Re: [Moses-support] Different phrase tables with same dataset

Barry Haddow Wed, 17 Jun 2015 05:57:54 -0700

Do you think that my medium system is effective? (Core i5 2400 , 4GBRAM, Ubuntu 32bit 14.04). Of course i wanted to train about 50000sentences.

For a small data set of 50k sentences, this should work. You could tryon 10k sentences to be sure.


On 17/06/15 13:46, Davood Mohammadifar wrote:

Thanks a lot Barry

I do not think the problem is related to persian side of corpus.Because My problem is remained when i'm running with French/Englishsample corpus (its link is in moses manual). Based on your comments, Ithink that i should check truecasing, recasing and cleaning tools thatworks properly in preprocessing.

Do you think that my medium system is effective? (Core i5 2400 , 4GBRAM, Ubuntu 32bit 14.04). Of course i wanted to train about 50000sentences.


------------------------------------------------------------------------
Date: Wed, 17 Jun 2015 12:34:59 +0100
From: [email protected]
To: [email protected]; [email protected]
Subject: Re: [Moses-support] Different phrase tables with same dataset

Hi Davood

From line 20113 onwards there's a whole bunch of error messagesindicating that the giza alignment didn't run properly, so then theresulting phrase extraction didn't work. I can't actually see why gizafailed though - possibly the corpus was not preprocessed correctly.I'm not familiar with the arabic tool chain,


cheers - Barry

On 16/06/15 18:24, Davood Mohammadifar wrote:

    Thanks Barry.

    I attached log file. The file reports two training phases. (after
    "(9) create moses.ini", the second training report has been
    appended).

    I executed following instruction for both:

    nohup nice
    /home/hieu/workspace/github/mosesdecoder/scripts/training/train-model.perl
    -mgiza -mgiza-cpus 2 -parallel -sort-batch-size 253 -sort-compress
    gzip -root-dir /home/hieu/train -corpus
    /home/hieu/corpus/training/training.clean -f fa -e en -alignment
    grow-diag-final-and -reordering msd-bidirectional-fe -lm
    0:3:/home/hieu/lm/training.blm.en:8 -external-bin-dir
    /home/hieu/workspace/github/mosesdecoder/tools



    Is there any error or unusual thing in it?

    ------------------------------------------------------------------------
    Date: Tue, 16 Jun 2015 13:01:10 +0100
    From: [email protected] <mailto:[email protected]>
    To: [email protected] <mailto:[email protected]>;
    [email protected] <mailto:[email protected]>
    Subject: Re: [Moses-support] Different phrase tables with same dataset

    Hi Davood

    It isn't normal to get such large differences in phrase table size
    or quality, on the same data set, although small variations are
    possible. You should check carefully that you used exactly the
    same settings in each run, and check if anything went wrong during
    training (errors in the log file),

    cheers - Barry

    On 16/06/15 12:00, Davood Mohammadifar wrote:

        Hello everyone

        I used Moses 3 for training my parallel corpus. I gained
        different BLEU scores (18.5-22.5); So i tried to find the
        reason. Finally, I understood that phrase tables are different
        from each other. I trained 50000 parallel sentences and the
        size of phrase table, for the first time was about 39MB (gz
        format) and in second time, it was about 59MB (gz format).
        Also the phrase tables' content are somewhat different (in
        scores, and entries).

        I used Mgiza and followed the instructions for baseline system
        in Moses manual. The problem was remained by using Giza++, too.

        The problem was remained in training of 150000 sentences, too.

        Is different size of phrase tables, normal?

        Thank you


        _______________________________________________
        Moses-support mailing list
        [email protected]  <mailto:[email protected]>
        http://mailman.mit.edu/mailman/listinfo/moses-support

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Different phrase tables with same dataset

Reply via email to