On 15/07/2013 09:10, Tom Hoar wrote:
Thanks, Hieu.

Re the = character. Can I assume this is new in the trunk and does not affect RELEASE-1.0? That one change will create huge problems with us to move away from release 1.
correct, it doesn't affect release-1.0, just the code in the current github repos

Re the mismatch error. This corpus is training a recaser model. We copied the (e) corpus and applied tokenization, lowercasing, etc. to the (f) copy. So, it doesn't make any sense. The exact same (e) corpus was part of a language pair and the standard SMT model trained without the error.

This train-moses.perl finished and tuning is running without error. However, the current 8th run has a BLEU score of "only" 0.9846. Based on previous models, I expected greater than 0.99. This might not sound important, but in this case, it is a huge the difference. 0.984 = 70% of the test segments match reference. 0.99 >= 95% test match reference.

Thanks again. I'll look deeper.



On 07/15/2013 02:08 PM, Hieu Hoang wrote:
Sentence mismatch error is definitely an important error. Is there a problem with your corpus? Dodgy encoding, Windows carriage return, range out of disk space etc?

Also, don't use the = character in directory name any more. It's being used to separate key=value pairs. eg.in <http://eg.in> the refactored ini file, a phrase-table entry
  0 0 0 5 file
becomes
  PhraseDictionaryMemory path=file input-factor=0 output-factor=0

It's not the cause of your errror but it will affect it further down the line. Sorry, should highlight this potential problem a little more

On 15 July 2013 02:07, Tom Hoar <[email protected] <mailto:[email protected]>> wrote:

    Here is the command line when I ran train-model.perl.

    /usr/bin/perl -w /usr/local/bin/train-model.perl \
      --do-steps 3 \
      --cores 6 \
      --corpus /opt/domy/BUILDS/lm/es-test-retokr/bitext \
      --e en_us \
      --external-bin-dir /usr/local/bin \
      --f es \
      --lm 0:0:/tmp/placeholder.lm:0 \
      --max-phrase-length 10 \
      --mgiza \
      --mgiza-cpus 6 \
      --model-dir
    
/opt/domy/TRAININGS/merts/mert-t=es-l=es-test-retokr-T=irstlmken-n=12-a=giza-g=10
    \
      --root-dir
    
/opt/domy/TRAININGS/merts/mert-t=es-l=es-test-retokr-T=irstlmken-n=12-a=giza-g=10

    The log output has a non-fatal error "Sentence mismatch error!" Any
    ideas about the cause or importance?

    (3) generate word alignment @ Mon Jul 15 07:44:56 ICT 2013
    Combining forward and inverted alignment from files:
    
/opt/domy/TRAININGS/merts/mert-t=es-l=es-test-retokr-T=irstlmken-n=12-a=giza-g=10/giza.es-en_us/es-en_us.A3.final.{bz2,gz}
    
/opt/domy/TRAININGS/merts/mert-t=es-l=es-test-retokr-T=irstlmken-n=12-a=giza-g=10/giza.en_us-es/en_us-es.A3.final.{bz2,gz}
    Executing: mkdir -p
    
/opt/domy/TRAININGS/merts/mert-t=es-l=es-test-retokr-T=irstlmken-n=12-a=giza-g=10
    Executing:
    /usr/local/lib/mosesdecoder/scripts/training/giza2bal.pl
    <http://giza2bal.pl> -d
    "gzip -cd
    
/opt/domy/TRAININGS/merts/mert-t=es-l=es-test-retokr-T=irstlmken-n=12-a=giza-g=10/giza.en_us-es/en_us-es.A3.final.gz"
    -i "gzip -cd
    
/opt/domy/TRAININGS/merts/mert-t=es-l=es-test-retokr-T=irstlmken-n=12-a=giza-g=10/giza.es-en_us/es-en_us.A3.final.gz"
    |/usr/local/lib/mosesdecoder/scripts/../bin/symal -alignment="grow"
    -diagonal="yes" -final="yes" -both="no" >
    
/opt/domy/TRAININGS/merts/mert-t=es-l=es-test-retokr-T=irstlmken-n=12-a=giza-g=10/aligned.grow-diag-final
    symal: computing grow alignment: diagonal (1) final
    (1)both-uncovered (0)
    Sentence mismatch error! Line #1179689
    skip=<0> counts=<1227038>

    _______________________________________________
    Moses-support mailing list
    [email protected] <mailto:[email protected]>
    http://mailman.mit.edu/mailman/listinfo/moses-support




--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu



_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to