How words are tokenised / segmented etc is crucial when using "small"
amounts of data.  For the vast numbers of people using Moses (people
not training-up on millions of sentence pairs) this is the kind of
thing that needs to be done correctly.

It would be a service to extend the Moses tokeniser to deal with
languages other than just those ones you mentioned before.

Miles

On 11 February 2010 17:51, Christof Pintaske <[email protected]> wrote:
> Hi,
>
> you may want to have a closer look at tokenizer.perl which is used for
> word-breaking. It seems there is some special logic to handle English,
> French, and Italian but nothing much else.
>
> I'm not sure if you can or plan to reveal your findings here on the list
> but at any rate I'd be very interested to learn how Chinese worked for you.
>
> best regards
> Christof
>
> nati g wrote:
>> Hello,
>>  Do we need any special scripts to build moses for translating english
>> to chinese.
>>
>> thanks in advance.
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to