Re: [Moses-support] Different phrase tables with same dataset

Davood Mohammadifar Wed, 17 Jun 2015 05:50:06 -0700

Thanks a lot Barry

I do not think the problem is related to persian side of corpus. Because My 
problem is remained when i'm running with French/English sample corpus (its 
link is in moses manual). Based on your comments, I think that i should check 
truecasing, recasing and cleaning tools that works properly in preprocessing.


Do you think that my medium system is effective? (Core i5 2400 , 4GB RAM, 
Ubuntu 32bit 14.04). Of course i wanted to train about 50000 sentences. 

Date: Wed, 17 Jun 2015 12:34:59 +0100
From: [email protected]
To: [email protected]; [email protected]
Subject: Re: [Moses-support] Different phrase tables with same dataset


  
    
  
  
    Hi Davood

    

    From line 20113 onwards there's a whole bunch of error messages
    indicating that the giza alignment didn't run properly, so then the
    resulting phrase extraction didn't work. I can't actually see why
    giza failed though - possibly the corpus was not preprocessed
    correctly. I'm not familiar with the arabic tool chain,

    

    cheers - Barry

    

    On 16/06/15 18:24, Davood Mohammadifar
      wrote:

    
    
      
      Thanks Barry.

        

        I attached log file. The file reports two training phases.
        (after "(9) create moses.ini", the second training report has
        been appended). 

        

        I executed following instruction for both: 

        

        nohup nice
        
/home/hieu/workspace/github/mosesdecoder/scripts/training/train-model.perl
        -mgiza -mgiza-cpus 2 -parallel -sort-batch-size 253
        -sort-compress gzip -root-dir /home/hieu/train -corpus
        /home/hieu/corpus/training/training.clean -f fa -e en -alignment
        grow-diag-final-and -reordering msd-bidirectional-fe -lm
        0:3:/home/hieu/lm/training.blm.en:8 -external-bin-dir
        /home/hieu/workspace/github/mosesdecoder/tools 

        

        

        

        Is there any error or unusual thing in it?

        

        
          Date: Tue, 16 Jun 2015 13:01:10 +0100

          From: [email protected]

          To: [email protected]; [email protected]

          Subject: Re: [Moses-support] Different phrase tables with same
          dataset

          

          Hi Davood

          

          It isn't normal to get such large differences in phrase table
          size or quality, on the same data set, although small
          variations are possible. You should check carefully that you
          used exactly the same settings in each run, and check if
          anything went wrong during training (errors in the log file),

          

          cheers - Barry

          

          On 16/06/15 12:00, Davood
            Mohammadifar wrote:

          
          
            
            Hello everyone
              

              
              I used Moses 3 for training my parallel corpus. I
                gained different BLEU scores (18.5-22.5); So i tried to
                find the reason. Finally, I understood that phrase
                tables are different from each other. I trained 50000
                parallel sentences and the size of phrase table, for the
                first time was about 39MB (gz format) and in second
                time, it was about 59MB (gz format). Also the phrase
                tables' content are somewhat different (in scores, and
                entries).
              

              
              I used Mgiza and followed the instructions for
                baseline system in Moses manual. The problem was
                remained by using Giza++, too.
              

              
              The problem was remained in training of 150000
                sentences, too.
              

              
              Is different size of phrase tables, normal?
              

              
              Thank you
            
            

            
            

            _______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Different phrase tables with same dataset

Reply via email to