I'm familiar with two methods to segment Chinese. One method simply
inserts a space between each character. The results are predictable, but
translations are generally not as high quality as possible. 

The second
method uses a program that identifies words as sequences of multiple
characters (typically 1, 2 or 3) and inserts a space between them. I
haven't worked with Chinese for a while, so I'm not sure of the latest
advancements in Chinese word segmentation. LDC publishes a perl script,
http://projects.ldc.upenn.edu/Chinese/ [1],
http://www.ldc.upenn.edu/Projects/Chinese/ldc-cn-seg.1.2.tgz. I remember
seeing a C++ version, but can't find it now. There's also this one on
Google code: http://code.google.com/p/zhseg/ [2] 

Maybe someone on
moses-support knows of other Chinese tools. 

Regards,
Tom 

On Thu, 18
Aug 2011 09:16:48 +0800, 蒋乾  wrote:  
Hi, 

Thank your for your
suggestions. 

I have done some test. It showed both English to Chinese
and Chinese to English training 
would failed if I did not do any
measures. 

Suzy and Tom gave me a useful advice that do something like
segment. The further question 
is, how to do segment? 

Could anybody
who has the experience of training corpus either from English to Chinese
or 
from Chinese to English give me some idea? 

Thank you very much.


Regards, 
James

2011/8/17 Tom Hoar 
  I agree with Suzy. Also, if
your translation requests are not
 segmented, it's possible that the
training corpus was also not
 segmented. Verify that your training
corpus, develop and test sets were
 all segmented when you trained/tuned
your translation model. If not,
 you'll need to start from the
beginning.

 Tom

 On Wed, 17 Aug 2011 19:28:17 +1000, Suzy Howlett 

wrote:
> Hi James,
>
> It looks like the text has not been segmented
into words, so it
 > thinks
> every sentence is a single word. Unless
the sentences you are trying
> to
> translate are identical to some
sentences in the training corpus, it
> will think every test sentence is
an unknown word it's never seen
 > before. You'll need to use some kind
of word segmentation.
> Unfortunately
> I don't know anything about that
area, so I have no useful
> suggestions.
>
> Best,
> Suzy
>
 > On
17/08/11 7:13 PM, 蒋乾 wrote:
>> *Hi all,
>> *
>> *When I used MT to do
translation from Chines to English, I meet an
>> unexpected
problem.Could you please tell *
>> *me the reason if you have any idea
about it?*
 >> **
>> *I trained a big amount of paralleled corpus about
2,600,000 lines
>> on a
>> computer with 5GB RAM.*
>> *After that, I
tried translating a small Chinese file about 80 lines
 >> into
English.Unexpectedly, it didn't work.*
>> *It did not do any translation
work at all. The target file I got
>> was as
>> same as the source
file.*
>> **
>> *One sample line of the information shown on the screen
during MT's
 >> traslation is as follows,*
>>
>> "
>> Translating:
使用文本索引查询视图
>> Collecting options took 0.000 seconds
>> Search took 0.000
seconds
>> BEST TRANSLATION: 使用文本索引查询视图|UNK|UNK|UNK [1]
 >>
[total=-99.978]  0.000, 0.000, 0.000, -7.346, 0.000, 0.000, 0.000,
0.000, 0.000>>
>> Translation took 0.000 seconds
>> Finished
translating
 >> Translating: 使用文本索引查询视图关于
>> Collecting options took
0.000 seconds
>> Search took 0.000 seconds
>> BEST TRANSLATION:
使用文本索引查询视图关于|UNK|UNK|UNK [1]
>> [total=-99.978]  0.000, 0.000, 0.000,
-7.346, 0.000, 0.000, 0.000, 0.000, 0.000>>
>> Translation took 0.000
seconds
>> Finished translating
>> "
>>
>> *It is very appreciated if
you could tell me the reason why it
 >> happens
>> and the way how to
solve it.*
>> **
>> *Thank you very much.*
>> **
>> *Regards,*
>>
*James*
>>
>>
>> _______________________________________________
 >>
Moses-support mailing list
>> [email protected] [5]
>>
http://mailman.mit.edu/mailman/listinfo/moses-support
[6]

_______________________________________________
Moses-support
mailing list
[email protected]
[7]
http://mailman.mit.edu/mailman/listinfo/moses-support [8]   



Links:
------
[1] http://projects.ldc.upenn.edu/Chinese/
[2]
http://code.google.com/p/zhseg/
[3]
mailto:[email protected]
[4]
mailto:[email protected]
[5] mailto:[email protected]
[6]
http://mailman.mit.edu/mailman/listinfo/moses-support
[7]
mailto:[email protected]
[8]
http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to