Thank you for your replies. The segmenters you recommanded to me are very useful.
At last I chose Stanford Chinese Segmenter ( http://nlp.stanford.edu/software/segmenter.shtml) which Kevin advised to me. It could be used directly for free on linux operation. And its segmente quality is good as I used. The tool Tianliang recommanded could be used on both windows & linux. But it could only do about 1000 sentences at once as the demo though the supporter told me it could do better if I used its API. I have solved the problem I raised up. It is the reason as Suzy and Tom assert. Thank you. 在 2011年8月18日 下午1:19,Kevin Gimpel <[email protected]>写道: > I've found the Stanford Chinese Segmenter ( > http://nlp.stanford.edu/software/segmenter.shtml) to work well. > > See the following paper for information on this segmenter and some > perspective on the problem: > Pi-Chuan Chang, Michel Galley and Chris Manning. "Optimizing Chinese Word > Segmentation for Machine Translation Performance." in ACL Third Workshop on > Statistical Machine Translation, 2008. > http://nlp.stanford.edu/pubs/acl-wmt08-cws.pdf > > Kevin > > > On Wed, Aug 17, 2011 at 10:25 PM, Tom Hoar < > [email protected]> wrote: > >> I'm familiar with two methods to segment Chinese. One method simply >> inserts a space between each character. The results are predictable, but >> translations are generally not as high quality as possible. >> >> The second method uses a program that identifies words as sequences of >> multiple characters (typically 1, 2 or 3) and inserts a space between them. >> I haven't worked with Chinese for a while, so I'm not sure of the latest >> advancements in Chinese word segmentation. LDC publishes a perl script, >> http://projects.ldc.upenn.edu/Chinese/, >> http://www.ldc.upenn.edu/Projects/Chinese/ldc-cn-seg.1.2.tgz. I remember >> seeing a C++ version, but can't find it now. There's also this one on Google >> code: http://code.google.com/p/zhseg/ >> >> Maybe someone on moses-support knows of other Chinese tools. >> >> Regards, >> Tom >> >> >> >> On Thu, 18 Aug 2011 09:16:48 +0800, 蒋乾 <[email protected]> wrote: >> >> Hi, >> >> Thank your for your suggestions. >> >> I have done some test. It showed both English to Chinese and Chinese to >> English training >> would failed if I did not do any measures. >> >> Suzy and Tom gave me a useful advice that do something like segment. The >> further question >> is, how to do segment? >> >> Could anybody who has the experience of training corpus either from >> English to Chinese or >> from Chinese to English give me some idea? >> >> Thank you very much. >> >> Regards, >> James >> >> 2011/8/17 Tom Hoar <[email protected]> >> >>> I agree with Suzy. Also, if your translation requests are not >>> segmented, it's possible that the training corpus was also not >>> segmented. Verify that your training corpus, develop and test sets were >>> all segmented when you trained/tuned your translation model. If not, >>> you'll need to start from the beginning. >>> >>> Tom >>> >>> >>> On Wed, 17 Aug 2011 19:28:17 +1000, Suzy Howlett <[email protected]> >>> wrote: >>> > Hi James, >>> > >>> > It looks like the text has not been segmented into words, so it >>> > thinks >>> > every sentence is a single word. Unless the sentences you are trying >>> > to >>> > translate are identical to some sentences in the training corpus, it >>> > will think every test sentence is an unknown word it's never seen >>> > before. You'll need to use some kind of word segmentation. >>> > Unfortunately >>> > I don't know anything about that area, so I have no useful >>> > suggestions. >>> > >>> > Best, >>> > Suzy >>> > >>> > On 17/08/11 7:13 PM, 蒋乾 wrote: >>> >> *Hi all, >>> >> * >>> >> *When I used MT to do translation from Chines to English, I meet an >>> >> unexpected problem.Could you please tell * >>> >> *me the reason if you have any idea about it?* >>> >> ** >>> >> *I trained a big amount of paralleled corpus about 2,600,000 lines >>> >> on a >>> >> computer with 5GB RAM.* >>> >> *After that, I tried translating a small Chinese file about 80 lines >>> >> into English.Unexpectedly, it didn't work.* >>> >> *It did not do any translation work at all. The target file I got >>> >> was as >>> >> same as the source file.* >>> >> ** >>> >> *One sample line of the information shown on the screen during MT's >>> >> traslation is as follows,* >>> >> >>> >> " >>> >> Translating: 使用文本索引查询视图 >>> >> Collecting options took 0.000 seconds >>> >> Search took 0.000 seconds >>> >> BEST TRANSLATION: 使用文本索引查询视图|UNK|UNK|UNK [1] >>> >> [total=-99.978] <<0.000, -1.000, -100.000, 0.000, 0.000, 0.000, >>> >> 0.000, 0.000, 0.000, -7.346, 0.000, 0.000, 0.000, 0.000, 0.000>> >>> >> Translation took 0.000 seconds >>> >> Finished translating >>> >> Translating: 使用文本索引查询视图关于 >>> >> Collecting options took 0.000 seconds >>> >> Search took 0.000 seconds >>> >> BEST TRANSLATION: 使用文本索引查询视图关于|UNK|UNK|UNK [1] >>> >> [total=-99.978] <<0.000, -1.000, -100.000, 0.000, 0.000, 0.000, >>> >> 0.000, 0.000, 0.000, -7.346, 0.000, 0.000, 0.000, 0.000, 0.000>> >>> >> Translation took 0.000 seconds >>> >> Finished translating >>> >> " >>> >> >>> >> *It is very appreciated if you could tell me the reason why it >>> >> happens >>> >> and the way how to solve it.* >>> >> ** >>> >> *Thank you very much.* >>> >> ** >>> >> *Regards,* >>> >> *James* >>> >> >>> >> >>> >> _______________________________________________ >>> >> Moses-support mailing list >>> >> [email protected] >>> >> http://mailman.mit.edu/mailman/listinfo/moses-support >>> >>> >>> _______________________________________________ >>> Moses-support mailing list >>> [email protected] >>> http://mailman.mit.edu/mailman/listinfo/moses-support >>> >> >> >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support >> >> > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
