Dear Moses Community,

I want to train Moses with byte-pair encoding tokenization (BPE,
https://github.com/rsennrich/subword-nmt). I plan to do it "by hand"
without the EMS.

Is there any problem with the idea?

Would it be Ok just to apply BPE after tokenization, truecasing, etc and
then go on with the rest of the typical steps?

Is there any gotcha I should take into account?

I have only identified as potential pitfall that I have to clean the corpus
with clean-corpus-n.perl after applying BPE in order not to reach the
maximum fertility 9 for mgiza.

Any success/failure experiences doing similar stuff are also very welcome.

Thanks,
Noe.
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to