Thank you for your replies.

The segmenters you recommanded to me are very useful.

At last I chose Stanford Chinese Segmenter (
http://nlp.stanford.edu/software/segmenter.shtml)
which Kevin advised to me. It could be used directly for free on
linux operation. And its segmente
quality is good as I used.

The tool Tianliang recommanded could be used on both windows & linux. But it
could only do
about 1000 sentences at once as the demo though the supporter told me it
could do better if
I used its API.

I have solved the problem I raised up. It is the reason as Suzy and Tom
assert.

Thank you.

在 2011年8月18日 下午1:19,Kevin Gimpel <[email protected]>写道:

> I've found the Stanford Chinese Segmenter (
> http://nlp.stanford.edu/software/segmenter.shtml) to work well.
>
> See the following paper for information on this segmenter and some
> perspective on the problem:
> Pi-Chuan Chang, Michel Galley and Chris Manning. "Optimizing Chinese Word
> Segmentation for Machine Translation Performance." in ACL Third Workshop on
> Statistical Machine Translation, 2008.
> http://nlp.stanford.edu/pubs/acl-wmt08-cws.pdf
>
> Kevin
>
>
> On Wed, Aug 17, 2011 at 10:25 PM, Tom Hoar <
> [email protected]> wrote:
>
>> I'm familiar with two methods to segment Chinese. One method simply
>> inserts a space between each character. The results are predictable, but
>> translations are generally not as high quality as possible.
>>
>> The second method uses a program that identifies words as sequences of
>> multiple characters (typically 1, 2 or 3) and inserts a space between them.
>> I haven't worked with Chinese for a while, so I'm not sure of the latest
>> advancements in Chinese word segmentation. LDC publishes a perl script,
>> http://projects.ldc.upenn.edu/Chinese/,
>> http://www.ldc.upenn.edu/Projects/Chinese/ldc-cn-seg.1.2.tgz. I remember
>> seeing a C++ version, but can't find it now. There's also this one on Google
>> code: http://code.google.com/p/zhseg/
>>
>> Maybe someone on moses-support knows of other Chinese tools.
>>
>> Regards,
>> Tom
>>
>>
>>
>> On Thu, 18 Aug 2011 09:16:48 +0800, 蒋乾 <[email protected]> wrote:
>>
>> Hi,
>>
>> Thank your for your suggestions.
>>
>> I have done some test. It showed both English to Chinese and Chinese to
>> English training
>> would failed if I did not do any measures.
>>
>> Suzy and Tom gave me a useful advice that do something like segment. The
>> further question
>> is,  how to do segment?
>>
>> Could anybody who has the experience of training corpus either from
>> English to Chinese or
>> from Chinese to English give me some idea?
>>
>> Thank you very much.
>>
>> Regards,
>> James
>>
>> 2011/8/17 Tom Hoar <[email protected]>
>>
>>>  I agree with Suzy. Also, if your translation requests are not
>>>  segmented, it's possible that the training corpus was also not
>>>  segmented. Verify that your training corpus, develop and test sets were
>>>  all segmented when you trained/tuned your translation model. If not,
>>>  you'll need to start from the beginning.
>>>
>>>  Tom
>>>
>>>
>>>  On Wed, 17 Aug 2011 19:28:17 +1000, Suzy Howlett <[email protected]>
>>>  wrote:
>>> > Hi James,
>>> >
>>> > It looks like the text has not been segmented into words, so it
>>> > thinks
>>> > every sentence is a single word. Unless the sentences you are trying
>>> > to
>>> > translate are identical to some sentences in the training corpus, it
>>> > will think every test sentence is an unknown word it's never seen
>>> > before. You'll need to use some kind of word segmentation.
>>> > Unfortunately
>>> > I don't know anything about that area, so I have no useful
>>> > suggestions.
>>> >
>>> > Best,
>>> > Suzy
>>> >
>>> > On 17/08/11 7:13 PM, 蒋乾 wrote:
>>> >> *Hi all,
>>> >> *
>>> >> *When I used MT to do translation from Chines to English, I meet an
>>> >> unexpected problem.Could you please tell *
>>> >> *me the reason if you have any idea about it?*
>>> >> **
>>> >> *I trained a big amount of paralleled corpus about 2,600,000 lines
>>> >> on a
>>> >> computer with 5GB RAM.*
>>> >> *After that, I tried translating a small Chinese file about 80 lines
>>> >> into English.Unexpectedly, it didn't work.*
>>> >> *It did not do any translation work at all. The target file I got
>>> >> was as
>>> >> same as the source file.*
>>> >> **
>>> >> *One sample line of the information shown on the screen during MT's
>>> >> traslation is as follows,*
>>> >>
>>> >>     "
>>> >>     Translating: 使用文本索引查询视图
>>> >>     Collecting options took 0.000 seconds
>>> >>     Search took 0.000 seconds
>>> >>     BEST TRANSLATION: 使用文本索引查询视图|UNK|UNK|UNK [1]
>>> >>     [total=-99.978] <<0.000, -1.000, -100.000, 0.000, 0.000, 0.000,
>>> >>     0.000, 0.000, 0.000, -7.346, 0.000, 0.000, 0.000, 0.000, 0.000>>
>>> >>     Translation took 0.000 seconds
>>> >>     Finished translating
>>> >>     Translating: 使用文本索引查询视图关于
>>> >>     Collecting options took 0.000 seconds
>>> >>     Search took 0.000 seconds
>>> >>     BEST TRANSLATION: 使用文本索引查询视图关于|UNK|UNK|UNK [1]
>>> >>     [total=-99.978] <<0.000, -1.000, -100.000, 0.000, 0.000, 0.000,
>>> >>     0.000, 0.000, 0.000, -7.346, 0.000, 0.000, 0.000, 0.000, 0.000>>
>>> >>     Translation took 0.000 seconds
>>> >>     Finished translating
>>> >>     "
>>> >>
>>> >> *It is very appreciated if you could tell me the reason why it
>>> >> happens
>>> >> and the way how to solve it.*
>>> >> **
>>> >> *Thank you very much.*
>>> >> **
>>> >> *Regards,*
>>> >> *James*
>>> >>
>>> >>
>>> >> _______________________________________________
>>> >> Moses-support mailing list
>>> >> [email protected]
>>> >> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> [email protected]
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to