You will need a Chinese word segmenter to prepare the data for
training/decoding. There are several available (list in no particular
order):
http://code.google.com/p/zhseg/
http://nlp.stanford.edu/software/segmenter.shtml
http://projects.ldc.upenn.edu/Chinese/LDC_ch.htm#cseg
I haven't tried any of them and I believe most of them are for the
Simplified Chinese script.

On Fri, Feb 12, 2010 at 11:10 PM, nati g <[email protected]> wrote:

> Hello,
>
> Did any tried setting up moses for translating english --> chinese?. please
> share any information ,scripts that can be used other than provided in step
> by step guide.
>
> Thanks in Advance.
>
> On Fri, Feb 12, 2010 at 7:15 PM, Christine de Bond <[email protected]> wrote:
>
>> You might ask the moses-list people if anyone has done english-chinese
>> translation / alignment and got any reasonable output. They might give you
>> some more hints!
>>
>> by the way, how big is you parallel corpus?
>> Another idea might be to check if factored translation models are of any
>> help to you (I'm thinking of alignment and reordering factors here - but I'm
>> not sure, if this is appropriate for Chinese...)
>>
>> nati g schrieb:
>>
>>> Hi Christine,
>>> thank you very much for the information.
>>>  I had aleady tried skipping these steps, but the translation quality is
>>> too bad.
>>> unlike to europen languages,double byte languages like
>>> chinese,koren,japanies have a different language syntax.for example
>>> tanslation of an english string with few words may be in a single
>>> character.i guess because of these types of synatic dissimilarites we are
>>> not getting good translation model after training.
>>>  Thank you very much.
>>>
>>> On Thu, Feb 11, 2010 at 7:46 PM, Christine de Bond <[email protected]<mailto:
>>> [email protected]>> wrote:
>>>
>>>    Hi
>>>    I don't know much about Chinese, but there is no lowercase in
>>>    Chinese, right?
>>>    You can skip the lowercasing part, if there are no
>>>    capital/lowercase letters in Chinese.
>>>
>>>    As for tokenizing - best is to have a look at the perl-script so
>>>    see what it's doing. You should make sure, that no punctuation (if
>>>    there is any in Chinese) is not concatenated with words ( word. ->
>>>    word . ) I think, the moses-tokenizer-script should work well for
>>>    your corpus - as long as there is no special issue in chinese
>>>    punctuation.
>>>    (I've so far used it with latin and persian character sets.)
>>>
>>>    Best is to try out the tokenizer.perl script with some test
>>>    sentences to see what the script is doing to your input.
>>>
>>>    Christine
>>>
>>>    nati g schrieb:
>>>
>>>        Hi,
>>>         Thank you very much reply.
>>>        i am having concerns about the tokenizer, lowercasing,sort
>>>        scripts while training the translation model from corpus.
>>>        will thsese no thave any effect on language going to use?
>>>        On Thu, Feb 11, 2010 at 2:43 PM, Christine de Bond
>>>        <[email protected] <mailto:[email protected]> <mailto:[email protected]
>>>
>>>        <mailto:[email protected]>>> wrote:
>>>
>>>           Hi,
>>>           moses is language-independent. There is no need for adaptation.
>>>           Best is to follow the "Step-by-Step Guide" on the moses
>>>        website to
>>>           get started.
>>>
>>>           Regards,
>>>           Christine
>>>
>>>           nati g schrieb:
>>>
>>>               Hello,
>>>                Do we need any special scripts to build moses for
>>>        translating
>>>               english to chinese.
>>>                thanks in advance.
>>>
>>> ------------------------------------------------------------------------
>>>
>>>
>>>
>>>               _______________________________________________
>>>               Moses-support mailing list
>>>               [email protected] <mailto:[email protected]>
>>>        <mailto:[email protected] <mailto:[email protected]>>
>>>
>>>
>>>               http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to