You can also try Corpus Filtergraph to prepare your training data. It now includes a two graphs: corpus.europarl5 and build.europarl5. These two graphs, when run in that order, will prepare the Europarl data to a set of training data for moses. The project download includes a sample data subset (200 files) of the en-nl Europarl v5.
http://sourceforge.net/projects/corpfiltergraph/ The latest version 3.1.145 includes many bugfixes and minor enhancements. I invite those who downloaded it before to try it again. Also, I'd appreciate any feedback from those in the community who try it. Best regards, Tom On Sun, 01 Aug 2010 20:51:46 +0100, Hieu Hoang <[email protected]> wrote: > Hi Hung > > you can find it here > http://www.statmt.org/wmt07/scripts.tgz > I'll probably add that to the script directory, if no-one objects > > On 01/08/2010 18:27, hungnv54 wrote: >> >> Hi, I have a moses SMT system. I am preparing data for building a >> model. In step "lowercase training data", I don't find lowercase.perl >> scripts in scripts directory. >> >> >> >> ------------------------------------------------------------------------ >> >> Thư được gửi từ http://mail.zing.vn >> >> >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
