[Moses-support] incomplete phrase table

Janek Amann Thu, 26 Jul 2018 09:14:38 -0700

Hi all,

I'm pretty new to Moses and I don't think I'm able to figure this out on my own. I'm trying to train Moses with this very small data set.

Src:

A C
B C
A D
B E

Tgt:

X
X
Y
Z

And this is my test set:

Src:

A C
B C
A D
B D
A E
B E

Tgt:

X
X
Y
Y
Z
Z

This is the phrase table I'm getting:

A C ||| X ||| 0.5 0.25 1 1 ||| 0-0 1-0 ||| 2 1 1 ||| |||
A D ||| Y ||| 1 1 1 1 ||| 0-0 1-0 ||| 1 1 1 ||| |||
B C ||| X ||| 0.5 0.25 1 0.75 ||| 0-0 1-0 ||| 2 1 1 ||| |||
B E ||| Z ||| 1 1 1 1 ||| 0-0 1-0 ||| 1 1 1 ||| |||

For some reason Moses didn't extract any single tokens which of course messes up the translation model.

These are the commands I used:

for the language model:

/home/janek/mosesdecoder/bin/lmplz \
-o 3 </home/janek/Desktop/Moses/data/moses_train_4.tgt > /home/janek/Desktop/Moses/lm/moses_train_4.arpa.tgt \
--discount_fallback

and the translation model:

/home/janek/mosesdecoder/scripts/training/train-model.perl \
-root-dir /home/janek/Desktop/Moses/working \
-corpus /home/janek/Desktop/Moses/data/moses_train_4 \
-f src \
-e tgt \
-alignment grow-diag-final-and \
-reordering msd-bidirectional-fe \
-lm 0:1:/home/janek/Desktop/Moses/lm/moses_train_4.arpa.tgt:8 \
-external-bin-dir /home/janek/mosesdecoder/mgiza/mgizapp \
-mgiza

Since my dataset is very small I skipped tokenizing and truecasing. I didn't do any tuning also.

I've already tried out all possible options for the alignment but it didn't change a thing.

I'd be really grateful if someone could point me to a solution or at least the right direction for solving this.

This is my first time posting something in a support forum so I don't know if you need any more information.

Just let me know if you do.

Thanks for your help.

Best,

Janek

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] incomplete phrase table

Reply via email to