How words are tokenised / segmented etc is crucial when using "small" amounts of data. For the vast numbers of people using Moses (people not training-up on millions of sentence pairs) this is the kind of thing that needs to be done correctly.
It would be a service to extend the Moses tokeniser to deal with languages other than just those ones you mentioned before. Miles On 11 February 2010 17:51, Christof Pintaske <[email protected]> wrote: > Hi, > > you may want to have a closer look at tokenizer.perl which is used for > word-breaking. It seems there is some special logic to handle English, > French, and Italian but nothing much else. > > I'm not sure if you can or plan to reveal your findings here on the list > but at any rate I'd be very interested to learn how Chinese worked for you. > > best regards > Christof > > nati g wrote: >> Hello, >> Do we need any special scripts to build moses for translating english >> to chinese. >> >> thanks in advance. >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
