Hi Noe, We had done translation between related languages using BPE with Moses without using EMS. I did not face any problems in particular. A few things that we did for our scenario:
- Sentence length could increase, increasing decoding time. Since, we were working on related languages we switched off reordering. - To speed up decoding, we used cube pruning with a small pop-limit ( https://www.cse.iitb.ac.in/~anoopk/publications/vardial2016_faster_subword.pdf ) - Again, we used a small BPE size (~3000 words) since we were working with similar languages and used a higher order LM (10 gram) You can find more details here: https://www.cse.iitb.ac.in/~anoopk/publications/sclem2017_bpe_related.pdf Regards, Anoop. You can see details here: https://www.cse.iitb.ac.in/~anoopk/publications/sclem2017_bpe_related.pdf On Sat, Mar 16, 2019 at 5:16 PM Noe Casas <[email protected]> wrote: > Dear Moses Community, > > I want to train Moses with byte-pair encoding tokenization (BPE, > https://github.com/rsennrich/subword-nmt). I plan to do it "by hand" > without the EMS. > > Is there any problem with the idea? > > Would it be Ok just to apply BPE after tokenization, truecasing, etc and > then go on with the rest of the typical steps? > > Is there any gotcha I should take into account? > > I have only identified as potential pitfall that I have to clean the > corpus with clean-corpus-n.perl after applying BPE in order not to reach > the maximum fertility 9 for mgiza. > > Any success/failure experiences doing similar stuff are also very welcome. > > Thanks, > Noe. > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > -- I claim to be a simple individual liable to err like any other fellow mortal. I own, however, that I have humility enough to confess my errors and to retrace my steps. http://flightsofthought.blogspot.com
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
