Hi all,
 
I'm pretty new to Moses and I don't think I'm able to figure this out on my own. I'm trying to train Moses with this very small data set.
 
Src:
 
A C
B C
A D
B E
 
Tgt:
 
X
X
Y
Z
 
And this is my test set:
 
Src:
 
A C
B C
A D
B D
A E
B E
 
Tgt:
 
X
X
Y
Y
Z
Z
 
 
This is the phrase table I'm getting:
 
A C ||| X ||| 0.5 0.25 1 1 ||| 0-0 1-0 ||| 2 1 1 ||| |||
A D ||| Y ||| 1 1 1 1 ||| 0-0 1-0 ||| 1 1 1 ||| |||
B C ||| X ||| 0.5 0.25 1 0.75 ||| 0-0 1-0 ||| 2 1 1 ||| |||
B E ||| Z ||| 1 1 1 1 ||| 0-0 1-0 ||| 1 1 1 ||| |||
 
For some reason Moses didn't extract any single tokens which of course messes up the translation model.
These are the commands I used:
 
for the language model:
 
/home/janek/mosesdecoder/bin/lmplz \
-o 3 </home/janek/Desktop/Moses/data/moses_train_4.tgt > /home/janek/Desktop/Moses/lm/moses_train_4.arpa.tgt \
--discount_fallback
 
and the translation model:
 
/home/janek/mosesdecoder/scripts/training/train-model.perl \
 -root-dir /home/janek/Desktop/Moses/working \
 -corpus /home/janek/Desktop/Moses/data/moses_train_4 \
 -f src \
 -e tgt \
 -alignment grow-diag-final-and \
 -reordering msd-bidirectional-fe \
 -lm 0:1:/home/janek/Desktop/Moses/lm/moses_train_4.arpa.tgt:8 \
 -external-bin-dir /home/janek/mosesdecoder/mgiza/mgizapp \
 -mgiza
 
Since my dataset is very small I skipped tokenizing and truecasing. I didn't do any tuning also.
I've already tried out all possible options for the alignment but it didn't change a thing.
I'd be really grateful if someone could point me to a solution or at least the right direction for solving this.
This is my first time posting something in a support forum so I don't know if you need any more information.
Just let me know if you do.
 
Thanks for your help.
 
Best,
Janek
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to