Hi (Sorry this thread is a little old, I wasn't paying attention to this at the time)
Thanks for the links to Chinese segmenters. However, for both of them (as well as for some others that I found elsewhere), I didn't find the "reverse desegmenter": the tool that will convert the output of moses back to human-readable Chinese (the equivalent of detokenizer.perl). Do you know if there are Chinese segmentation programs which come with this? Or perhaps independent desegmenters to be used with any segmenter? Raphael -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Barry Haddow Sent: 01 September 2010 10:01 To: [email protected] Cc: Wenlong Yang Subject: Re: [Moses-support] Train Moses Engine for EN to ZH_CN Hi Wenlong There are plenty of tools out there for segmenting Chinese. For example LingPipe http://alias-i.com/lingpipe/demos/tutorial/chineseTokens/read-me.html Stanford http://nlp.stanford.edu/software/segmenter.shtml etc. regards Barry On Wednesday 01 September 2010 04:09, Wenlong Yang wrote: > Hi Francois, > > Thanks for your information. > Can you help to let me know or send your scripts for the Chinese tokenizing > and detokenizing for my information? > > It seems the default tokenizing script doesn't support the Chinese language > code. > > Thanks so, > Wenlong > > > > 2010/8/31 <[email protected]> > > > Send Moses-support mailing list submissions to > > [email protected] > > > > To subscribe or unsubscribe via the World Wide Web, visit > > http://mailman.mit.edu/mailman/listinfo/moses-support > > or, via email, send a message with subject or body 'help' to > > [email protected] > > > > You can reach the person managing the list at > > [email protected] > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of Moses-support digest..." > > > > > > Today's Topics: > > > > 1. Re: No weight.ini in example data for EMS > > (Sonja PETROVI? LUNDBERG) > > 2. Fwd: Re: Tree based models - Eng > Ger general question > > (Hieu Hoang) > > 3. Re: No weight.ini in example data for EMS (Philipp Koehn) > > 4. How can I know used translation rules? (Lee, Joo-Young) > > 5. Re: No weight.ini in example data for EMS > > (Sonja PETROVI? LUNDBERG) > > 6. Re: Train Moses Engine for EN to ZH_CN (Francois Masselot) > > 7. Re: Train Moses Engine for EN to ZH_CN (Francois Masselot) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Tue, 31 Aug 2010 13:22:27 +0200 > > From: Sonja PETROVI? LUNDBERG <[email protected]> > > Subject: Re: [Moses-support] No weight.ini in example data for EMS > > To: Philipp Koehn <[email protected]> > > Cc: [email protected] > > Message-ID: > > > > <[email protected]<AANLkTinsWh > >jkfz5fre52oj%[email protected]> > > > > Content-Type: text/plain; charset=UTF-8 > > > > weight.ini was missing in that directory, but I've found it in > > another, almost identical, directory (with Moses decoder stuff). I > > have no idea why there were two different EMS directories, and why > > there was no weight.ini in the first one. > > > > Now I experience another problem with the same command "perl > > experiment.perl -config config.toy": > > > > DEFINE STEPS (run with -exec if everything ok) > > Warning: locale not supported by Xlib, locale set to C > > gv: Cannot open file steps/0/graph.0.ps (Inappropriate file type or > > format) > > > > The locale warning probably appears because I have OS X, but why the > > GhostView problem? > > > > Thank you, > > Sonja > > > > 2010/8/31 Philipp Koehn <[email protected]>: > > > Hi, > > > > > > the directory > > > > ?/Users/so/tools/moses-scripts/scripts-20100806-1525/ems/example/data > > > > > should contain: > > > > > > nc-5k.en > > > nc-5k.fr > > > test-ref.en.sgm > > > test-src.fr.sgm > > > weight.ini > > > > > > At least that is what is in the SVN directory > > > > moses/scripts/ems/example/data. > > > > > -phi > > > > > > 2010/8/31 Sonja PETROVI? LUNDBERG <[email protected]>: > > >> Hi! > > >> > > >> I am trying to learn how to use EMS on my computer, but already in the > > >> testing phase, using config.toy that comes with the installation of > > >> Moses, I experience this problem: > > >> > > >> ikso-ho:test so$ perl experiment.perl -config config.toy > > >> STARTING UP AS PROCESS 41208 ON ikso-ho.lan AT Tue Aug 31 11:05:48 > > >> CEST > > > > 2010 > > > > >> LOAD CONFIG... > > >> find: > > > > /Users/so/tools/moses-scripts/scripts-20100806-1525/ems/example/data/weight. ini*: > > >> No such file or directory > > >> TUNING:weight-config: file > > > > /Users/so/tools/moses-scripts/scripts-20100806-1525/ems/example/data/weig > >ht.ini > > > > >> does not exist! > > >> Died at experiment.perl line 355. > > >> > > >> Is weight.ini supposed to be there, or should it be created during the > > >> configuration process? > > >> > > >> Regards, > > >> Sonja > > >> _______________________________________________ > > >> Moses-support mailing list > > >> [email protected] > > >> http://mailman.mit.edu/mailman/listinfo/moses-support > > > > ------------------------------ > > > > Message: 2 > > Date: Tue, 31 Aug 2010 13:01:16 +0100 > > From: Hieu Hoang <[email protected]> > > Subject: [Moses-support] Fwd: Re: Tree based models - Eng > Ger > > general question > > To: [email protected] > > Message-ID: <[email protected]> > > Content-Type: text/plain; charset="iso-8859-1" > > > > > > hi arda > > > > I think an email by Chris Dyer sums up the issue that it's pretty hard > > to beat the phrase-based BLEU for many language pairs. > > http://www.mail-archive.com/[email protected]/msg01995.html > > here's Edinburgh's attempt from this years WMT10: > > http://aclweb.org/anthology-new/W/W10/W10-1715.pdf > > > > The straightforward way of adding syntax severely reduces BLEU, you have > > to add something extra to get any gains. Off the top of my head, the > > main ways that i've seen so far is > > 1. Add alternative parses, eg. forest decoding > > 2. Mix up the parse tree, eg. SAMT > > 3. Soft constrain instead of hard constraints, eg > > http://www.isi.edu/~chiang/papers/acl2010-chiang.pdf > > 4. Occasionally ignoring syntax, eg. > > http://aclweb.org/anthology-new/W/W10/W10-1761.pdf > > There's loads of other ways & papers i haven't mentioned > > > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > URL: > > http://mailman.mit.edu/mailman/private/moses-support/attachments/20100831 > >/59e86bad/attachment-0001.htm > > > > ------------------------------ > > > > Message: 3 > > Date: Tue, 31 Aug 2010 13:39:08 +0100 > > From: Philipp Koehn <[email protected]> > > Subject: Re: [Moses-support] No weight.ini in example data for EMS > > To: Sonja PETROVI? LUNDBERG <[email protected]> > > Cc: [email protected] > > Message-ID: > > > > <[email protected]<9v_7AfE5X8x > >s9bitu%[email protected]> > > > > Content-Type: text/plain; charset=UTF-8 > > > > Hi, > > > > ghostview seems to sometimes act funny and not display > > a just recently written file. In this case, you need to manually > > type > > > > % gv steps/0/graph.0.ps > > > > after running experiment.perl > > > > -phi > > > > 2010/8/31 Sonja PETROVI? LUNDBERG <[email protected]>: > > > weight.ini was missing in that directory, but I've found it in > > > another, almost identical, directory (with Moses decoder stuff). I > > > have no idea why there were two different EMS directories, and why > > > there was no weight.ini in the first one. > > > > > > Now I experience another problem with the same command "perl > > > experiment.perl -config config.toy": > > > > > > DEFINE STEPS (run with -exec if everything ok) > > > Warning: locale not supported by Xlib, locale set to C > > > gv: Cannot open file steps/0/graph.0.ps (Inappropriate file type or > > > > format) > > > > > The locale warning probably appears because I have OS X, but why the > > > GhostView problem? > > > > > > Thank you, > > > Sonja > > > > > > 2010/8/31 Philipp Koehn <[email protected]>: > > >> Hi, > > >> > > >> the directory > > > > ?/Users/so/tools/moses-scripts/scripts-20100806-1525/ems/example/data > > > > >> should contain: > > >> > > >> nc-5k.en > > >> nc-5k.fr > > >> test-ref.en.sgm > > >> test-src.fr.sgm > > >> weight.ini > > >> > > >> At least that is what is in the SVN directory > > > > moses/scripts/ems/example/data. > > > > >> -phi > > >> > > >> 2010/8/31 Sonja PETROVI? LUNDBERG <[email protected]>: > > >>> Hi! > > >>> > > >>> I am trying to learn how to use EMS on my computer, but already in > > >>> the testing phase, using config.toy that comes with the installation > > >>> of Moses, I experience this problem: > > >>> > > >>> ikso-ho:test so$ perl experiment.perl -config config.toy > > >>> STARTING UP AS PROCESS 41208 ON ikso-ho.lan AT Tue Aug 31 11:05:48 > > >>> CEST > > > > 2010 > > > > >>> LOAD CONFIG... > > >>> find: > > > > /Users/so/tools/moses-scripts/scripts-20100806-1525/ems/example/data/weight. ini*: > > >>> No such file or directory > > >>> TUNING:weight-config: file > > > > /Users/so/tools/moses-scripts/scripts-20100806-1525/ems/example/data/weig > >ht.ini > > > > >>> does not exist! > > >>> Died at experiment.perl line 355. > > >>> > > >>> Is weight.ini supposed to be there, or should it be created during > > >>> the configuration process? > > >>> > > >>> Regards, > > >>> Sonja > > >>> _______________________________________________ > > >>> Moses-support mailing list > > >>> [email protected] > > >>> http://mailman.mit.edu/mailman/listinfo/moses-support > > > > ------------------------------ > > > > Message: 4 > > Date: Tue, 31 Aug 2010 22:05:49 +0900 > > From: "Lee, Joo-Young" <[email protected]> > > Subject: [Moses-support] How can I know used translation rules? > > To: [email protected] > > Message-ID: > > > > <[email protected]<Db8A6Lt1T3u > >pc-wb6uviu2hwlw%[email protected]> > > > > Content-Type: text/plain; charset="iso-8859-1" > > > > Hi all, > > > > I use moses-chart and it works well. > > > > But, I want to know and get the translation rules which are used to > > translate a given source sentence in decoding time. > > > > Simply said, I try to find a way to know which translation rules are > > selected in each ChartCell of moses-chart. > > > > Is there any method or API? > > > > Best regards. > > > > Joo-Young Lee > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > URL: > > http://mailman.mit.edu/mailman/private/moses-support/attachments/20100831 > >/35a5fb00/attachment-0001.htm > > > > ------------------------------ > > > > Message: 5 > > Date: Tue, 31 Aug 2010 15:15:21 +0200 > > From: Sonja PETROVI? LUNDBERG <[email protected]> > > Subject: Re: [Moses-support] No weight.ini in example data for EMS > > To: Philipp Koehn <[email protected]> > > Cc: [email protected] > > Message-ID: > > <[email protected]> > > Content-Type: text/plain; charset=UTF-8 > > > > Thanks! > > > > Next problem happens in the first step: > > > > step TRAINING:prepare-data crashed > > number of steps doable or running: 0 > > > > I tried rm steps/1/TRAINING_prepare-data.1* and experiment.perl > > -continue 1 -exec, but it crashed again, at the same place. > > > > Sonja > > > > 2010/8/31 Philipp Koehn <[email protected]>: > > > Hi, > > > > > > ghostview seems to sometimes act funny and not display > > > a just recently written file. In this case, you need to manually > > > type > > > > > > ?% gv steps/0/graph.0.ps > > > > > > after running experiment.perl > > > > > > -phi > > > > > > 2010/8/31 Sonja PETROVI? LUNDBERG <[email protected]>: > > >> weight.ini was missing in that directory, but I've found it in > > >> another, almost identical, directory (with Moses decoder stuff). I > > >> have no idea why there were two different EMS directories, and why > > >> there was no weight.ini in the first one. > > >> > > >> Now I experience another problem with the same command "perl > > >> experiment.perl -config config.toy": > > >> > > >> DEFINE STEPS (run with -exec if everything ok) > > >> Warning: locale not supported by Xlib, locale set to C > > >> gv: Cannot open file steps/0/graph.0.ps (Inappropriate file type or > > > > format) > > > > >> The locale warning probably appears because I have OS X, but why the > > >> GhostView problem? > > >> > > >> Thank you, > > >> Sonja > > >> > > >> 2010/8/31 Philipp Koehn <[email protected]>: > > >>> Hi, > > >>> > > >>> the directory > > > > ?/Users/so/tools/moses-scripts/scripts-20100806-1525/ems/example/data > > > > >>> should contain: > > >>> > > >>> nc-5k.en > > >>> nc-5k.fr > > >>> test-ref.en.sgm > > >>> test-src.fr.sgm > > >>> weight.ini > > >>> > > >>> At least that is what is in the SVN directory > > > > moses/scripts/ems/example/data. > > > > >>> -phi > > >>> > > >>> 2010/8/31 Sonja PETROVI? LUNDBERG <[email protected]>: > > >>>> Hi! > > >>>> > > >>>> I am trying to learn how to use EMS on my computer, but already in > > >>>> the testing phase, using config.toy that comes with the installation > > >>>> of Moses, I experience this problem: > > >>>> > > >>>> ikso-ho:test so$ perl experiment.perl -config config.toy > > >>>> STARTING UP AS PROCESS 41208 ON ikso-ho.lan AT Tue Aug 31 11:05:48 > > > > CEST 2010 > > > > >>>> LOAD CONFIG... > > >>>> find: > > > > /Users/so/tools/moses-scripts/scripts-20100806-1525/ems/example/data/weight. ini*: > > >>>> No such file or directory > > >>>> TUNING:weight-config: file > > > > /Users/so/tools/moses-scripts/scripts-20100806-1525/ems/example/data/weig > >ht.ini > > > > >>>> does not exist! > > >>>> Died at experiment.perl line 355. > > >>>> > > >>>> Is weight.ini supposed to be there, or should it be created during > > >>>> the configuration process? > > >>>> > > >>>> Regards, > > >>>> Sonja > > >>>> _______________________________________________ > > >>>> Moses-support mailing list > > >>>> [email protected] > > >>>> http://mailman.mit.edu/mailman/listinfo/moses-support > > > > ------------------------------ > > > > Message: 6 > > Date: Tue, 31 Aug 2010 06:24:22 -0700 > > From: Francois Masselot <[email protected]> > > Subject: Re: [Moses-support] Train Moses Engine for EN to ZH_CN > > To: "[email protected]" <[email protected]> > > Message-ID: > > < > > 0A83ECC8B2B3F342A24B0483F48FEAD52B22A44A67@ADSK-NAMSG-01.MGDADSK.autodesk > >.com > > > > > > Content-Type: text/plain; charset="iso-8859-1" > > > > > > Dear Wenlong, > > > > The Moses toolkit is language independent, so there shouldn't be anything > > special to do. The one thing to take care of is to tokenize properly the > > Chinese training corpus. Moses takes as input sentences where words > > (tokens) are space-separated, and usually in Chinese texts, words are not > > separated by spaces. There's nothing else special: I created recently an > > English-Chinese and Chinese-English Moses engines and training and > > decoding work just fine. > > For decoding, you just need to tokenize and detokenize accordingly, i.e. > > tokenize Chinese source sentences, and remove spaces between Chinese > > words when Chinese is the target language. > > > > Regards > > Fran?ois > > > > > > > > > > ------------------------------ > > > > Message: 7 > > Date: Tue, 31 Aug 2010 06:27:20 -0700 > > From: Francois Masselot <[email protected]> > > Subject: Re: [Moses-support] Train Moses Engine for EN to ZH_CN > > To: "[email protected]" <[email protected]> > > Message-ID: > > < > > 0A83ECC8B2B3F342A24B0483F48FEAD52B22A44A73@ADSK-NAMSG-01.MGDADSK.autodesk > >.com > > > > > > Content-Type: text/plain; charset="iso-8859-1" > > > > Dear Wenlong, > > > > The Moses toolkit is language independent, so there shouldn't be anything > > special to do. The one thing to take care of is to tokenize properly the > > Chinese training corpus. Moses takes as input sentences where words > > (tokens) are space-separated, and usually in Chinese texts, words are not > > separated by spaces. There's nothing else special: I created recently an > > English-Chinese and Chinese-English Moses engines and training and > > decoding work just fine. > > For decoding, you just need to tokenize and detokenize accordingly, i.e. > > tokenize Chinese source sentences, and remove spaces between Chinese > > words when Chinese is the target language. > > > > Regards > > Fran?ois > > > > > > > > > > > > ------------------------------ > > > > _______________________________________________ > > Moses-support mailing list > > [email protected] > > http://mailman.mit.edu/mailman/listinfo/moses-support > > > > > > End of Moses-support Digest, Vol 46, Issue 42 > > ********************************************* -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
