Hi, if you have data like this, then you should also manually create word alignments for it.
This would guarantee that you get certain phrase pairs. You can take a look at the word alignment it generated to see why it fails sometimes. -phi On Thu, Jul 26, 2018 at 6:16 PM Hieu Hoang <[email protected]> wrote: > I guess you wanted it to create the following rules > c -> x > d -> y > e -> z > There's no guarantee that it will figure that out. A cause could be there > isn't enough training data. > > > > Hieu Hoang > http://statmt.org/hieu > > On 27 July 2018 at 02:06, Janek Amann <[email protected]> wrote: > >> Hi all, >> >> I'm pretty new to Moses and I don't think I'm able to figure this out on >> my own. I'm trying to train Moses with this very small data set. >> >> Src: >> >> A C >> B C >> A D >> B E >> >> Tgt: >> >> X >> X >> Y >> Z >> >> And this is my test set: >> >> Src: >> >> A C >> B C >> A D >> B D >> A E >> B E >> >> Tgt: >> >> X >> X >> Y >> Y >> Z >> Z >> >> >> This is the phrase table I'm getting: >> >> A C ||| X ||| 0.5 0.25 1 1 ||| 0-0 1-0 ||| 2 1 1 ||| ||| >> A D ||| Y ||| 1 1 1 1 ||| 0-0 1-0 ||| 1 1 1 ||| ||| >> B C ||| X ||| 0.5 0.25 1 0.75 ||| 0-0 1-0 ||| 2 1 1 ||| ||| >> B E ||| Z ||| 1 1 1 1 ||| 0-0 1-0 ||| 1 1 1 ||| ||| >> >> For some reason Moses didn't extract any single tokens which of course >> messes up the translation model. >> These are the commands I used: >> >> for the language model: >> >> /home/janek/mosesdecoder/bin/lmplz \ >> -o 3 </home/janek/Desktop/Moses/data/moses_train_4.tgt > >> /home/janek/Desktop/Moses/lm/moses_train_4.arpa.tgt \ >> --discount_fallback >> >> and the translation model: >> >> /home/janek/mosesdecoder/scripts/training/train-model.perl \ >> -root-dir /home/janek/Desktop/Moses/working \ >> -corpus /home/janek/Desktop/Moses/data/moses_train_4 \ >> -f src \ >> -e tgt \ >> -alignment grow-diag-final-and \ >> -reordering msd-bidirectional-fe \ >> -lm 0:1:/home/janek/Desktop/Moses/lm/moses_train_4.arpa.tgt:8 \ >> -external-bin-dir /home/janek/mosesdecoder/mgiza/mgizapp \ >> -mgiza >> >> Since my dataset is very small I skipped tokenizing and truecasing. I >> didn't do any tuning also. >> I've already tried out all possible options for the alignment but it >> didn't change a thing. >> I'd be really grateful if someone could point me to a solution or at >> least the right direction for solving this. >> This is my first time posting something in a support forum so I don't >> know if you need any more information. >> Just let me know if you do. >> >> Thanks for your help. >> >> Best, >> Janek >> >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support >> >> > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
