Hi,

if you have data like this, then you should also manually create word
alignments for it.

This would guarantee that you get certain phrase pairs.

You can take a look at the word alignment it generated to see why it fails
sometimes.

-phi

On Thu, Jul 26, 2018 at 6:16 PM Hieu Hoang <[email protected]> wrote:

> I guess you wanted it to create the following rules
>    c -> x
>    d -> y
>    e -> z
> There's no guarantee that it will figure that out. A cause could be there
> isn't enough training data.
>
>
>
> Hieu Hoang
> http://statmt.org/hieu
>
> On 27 July 2018 at 02:06, Janek Amann <[email protected]> wrote:
>
>> Hi all,
>>
>> I'm pretty new to Moses and I don't think I'm able to figure this out on
>> my own. I'm trying to train Moses with this very small data set.
>>
>> Src:
>>
>> A C
>> B C
>> A D
>> B E
>>
>> Tgt:
>>
>> X
>> X
>> Y
>> Z
>>
>> And this is my test set:
>>
>> Src:
>>
>> A C
>> B C
>> A D
>> B D
>> A E
>> B E
>>
>> Tgt:
>>
>> X
>> X
>> Y
>> Y
>> Z
>> Z
>>
>>
>> This is the phrase table I'm getting:
>>
>> A C ||| X ||| 0.5 0.25 1 1 ||| 0-0 1-0 ||| 2 1 1 ||| |||
>> A D ||| Y ||| 1 1 1 1 ||| 0-0 1-0 ||| 1 1 1 ||| |||
>> B C ||| X ||| 0.5 0.25 1 0.75 ||| 0-0 1-0 ||| 2 1 1 ||| |||
>> B E ||| Z ||| 1 1 1 1 ||| 0-0 1-0 ||| 1 1 1 ||| |||
>>
>> For some reason Moses didn't extract any single tokens which of course
>> messes up the translation model.
>> These are the commands I used:
>>
>> for the language model:
>>
>> /home/janek/mosesdecoder/bin/lmplz \
>> -o 3 </home/janek/Desktop/Moses/data/moses_train_4.tgt >
>> /home/janek/Desktop/Moses/lm/moses_train_4.arpa.tgt \
>> --discount_fallback
>>
>> and the translation model:
>>
>> /home/janek/mosesdecoder/scripts/training/train-model.perl \
>>  -root-dir /home/janek/Desktop/Moses/working \
>>  -corpus /home/janek/Desktop/Moses/data/moses_train_4 \
>>  -f src \
>>  -e tgt \
>>  -alignment grow-diag-final-and \
>>  -reordering msd-bidirectional-fe \
>>  -lm 0:1:/home/janek/Desktop/Moses/lm/moses_train_4.arpa.tgt:8 \
>>  -external-bin-dir /home/janek/mosesdecoder/mgiza/mgizapp \
>>  -mgiza
>>
>> Since my dataset is very small I skipped tokenizing and truecasing. I
>> didn't do any tuning also.
>> I've already tried out all possible options for the alignment but it
>> didn't change a thing.
>> I'd be really grateful if someone could point me to a solution or at
>> least the right direction for solving this.
>> This is my first time posting something in a support forum so I don't
>> know if you need any more information.
>> Just let me know if you do.
>>
>> Thanks for your help.
>>
>> Best,
>> Janek
>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to