Hi (Sorry this thread is a little old, I wasn't paying attention to this at
the time)

Thanks for the links to Chinese segmenters. However, for both of them (as
well as for some others that I found elsewhere), I didn't find the "reverse
desegmenter": the tool that will convert the output of moses back to
human-readable Chinese (the equivalent of detokenizer.perl). Do you know if
there are Chinese segmentation programs which come with this? Or perhaps
independent desegmenters to be used with any segmenter?

Raphael


-----Original Message-----
From: [email protected] [mailto:[email protected]]
On Behalf Of Barry Haddow
Sent: 01 September 2010 10:01
To: [email protected]
Cc: Wenlong Yang
Subject: Re: [Moses-support] Train Moses Engine for EN to ZH_CN

Hi Wenlong

There are plenty of tools out there for segmenting Chinese. For example
LingPipe
http://alias-i.com/lingpipe/demos/tutorial/chineseTokens/read-me.html
Stanford
http://nlp.stanford.edu/software/segmenter.shtml
etc.

regards
Barry

On Wednesday 01 September 2010 04:09, Wenlong Yang wrote:
> Hi Francois,
>
> Thanks for your information.
> Can you help to let me know or send your scripts for the Chinese
tokenizing
> and detokenizing for my information?
>
> It seems the default tokenizing script doesn't support the Chinese
language
> code.
>
> Thanks so,
> Wenlong
>
>
>
> 2010/8/31 <[email protected]>
>
> > Send Moses-support mailing list submissions to
> >        [email protected]
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> >        http://mailman.mit.edu/mailman/listinfo/moses-support
> > or, via email, send a message with subject or body 'help' to
> >        [email protected]
> >
> > You can reach the person managing the list at
> >        [email protected]
> >
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of Moses-support digest..."
> >
> >
> > Today's Topics:
> >
> >   1. Re: No weight.ini in example data for EMS
> >      (Sonja PETROVI? LUNDBERG)
> >   2. Fwd: Re: Tree based models - Eng > Ger general    question
> >      (Hieu Hoang)
> >   3. Re: No weight.ini in example data for EMS (Philipp Koehn)
> >   4. How can I know used translation rules? (Lee, Joo-Young)
> >   5. Re: No weight.ini in example data for EMS
> >      (Sonja PETROVI? LUNDBERG)
> >   6. Re: Train Moses Engine for EN to ZH_CN (Francois Masselot)
> >   7. Re: Train Moses Engine for EN to ZH_CN (Francois Masselot)
> >
> >
> > ----------------------------------------------------------------------
> >
> > Message: 1
> > Date: Tue, 31 Aug 2010 13:22:27 +0200
> > From: Sonja PETROVI? LUNDBERG <[email protected]>
> > Subject: Re: [Moses-support] No weight.ini in example data for EMS
> > To: Philipp Koehn <[email protected]>
> > Cc: [email protected]
> > Message-ID:
> >       
> >
<[email protected]<AANLkTinsWh
> >jkfz5fre52oj%[email protected]>
> >
> > Content-Type: text/plain; charset=UTF-8
> >
> > weight.ini was missing in that directory, but I've found it in
> > another, almost identical, directory (with Moses decoder stuff). I
> > have no idea why there were two different EMS directories, and why
> > there was no weight.ini in the first one.
> >
> > Now I experience another problem with the same command "perl
> > experiment.perl -config config.toy":
> >
> > DEFINE STEPS (run with -exec if everything ok)
> > Warning: locale not supported by Xlib, locale set to C
> > gv: Cannot open file steps/0/graph.0.ps (Inappropriate file type or
> > format)
> >
> > The locale warning probably appears because I have OS X, but why the
> > GhostView problem?
> >
> > Thank you,
> > Sonja
> >
> > 2010/8/31 Philipp Koehn <[email protected]>:
> > > Hi,
> > >
> > > the directory
> >
> > ?/Users/so/tools/moses-scripts/scripts-20100806-1525/ems/example/data
> >
> > > should contain:
> > >
> > > nc-5k.en
> > > nc-5k.fr
> > > test-ref.en.sgm
> > > test-src.fr.sgm
> > > weight.ini
> > >
> > > At least that is what is in the SVN directory
> >
> > moses/scripts/ems/example/data.
> >
> > > -phi
> > >
> > > 2010/8/31 Sonja PETROVI? LUNDBERG <[email protected]>:
> > >> Hi!
> > >>
> > >> I am trying to learn how to use EMS on my computer, but already in
the
> > >> testing phase, using config.toy that comes with the installation of
> > >> Moses, I experience this problem:
> > >>
> > >> ikso-ho:test so$ perl experiment.perl -config config.toy
> > >> STARTING UP AS PROCESS 41208 ON ikso-ho.lan AT Tue Aug 31 11:05:48
> > >> CEST
> >
> > 2010
> >
> > >> LOAD CONFIG...
> > >> find:
> >
> 
>
/Users/so/tools/moses-scripts/scripts-20100806-1525/ems/example/data/weight.
ini*:
> > >> No such file or directory
> > >> TUNING:weight-config: file
> >
> >
/Users/so/tools/moses-scripts/scripts-20100806-1525/ems/example/data/weig
> >ht.ini
> >
> > >> does not exist!
> > >> Died at experiment.perl line 355.
> > >>
> > >> Is weight.ini supposed to be there, or should it be created during
the
> > >> configuration process?
> > >>
> > >> Regards,
> > >> Sonja
> > >> _______________________________________________
> > >> Moses-support mailing list
> > >> [email protected]
> > >> http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> > ------------------------------
> >
> > Message: 2
> > Date: Tue, 31 Aug 2010 13:01:16 +0100
> > From: Hieu Hoang <[email protected]>
> > Subject: [Moses-support] Fwd: Re: Tree based models - Eng > Ger
> >        general question
> > To: [email protected]
> > Message-ID: <[email protected]>
> > Content-Type: text/plain; charset="iso-8859-1"
> >
> >
> > hi arda
> >
> > I think an email by Chris Dyer sums up the issue that it's pretty hard
> > to beat the phrase-based BLEU for many language pairs.
> > http://www.mail-archive.com/[email protected]/msg01995.html
> > here's Edinburgh's attempt from this years WMT10:
> > http://aclweb.org/anthology-new/W/W10/W10-1715.pdf
> >
> > The straightforward way of adding syntax severely reduces BLEU, you have
> > to add something extra to get any gains. Off the top of my head, the
> > main ways that i've seen so far is
> >   1. Add alternative parses, eg. forest decoding
> >   2. Mix up the parse tree, eg. SAMT
> >   3. Soft constrain instead of hard constraints, eg
> > http://www.isi.edu/~chiang/papers/acl2010-chiang.pdf
> >   4. Occasionally ignoring syntax, eg.
> > http://aclweb.org/anthology-new/W/W10/W10-1761.pdf
> > There's loads of other ways & papers i haven't mentioned
> >
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> > URL:
> >
http://mailman.mit.edu/mailman/private/moses-support/attachments/20100831
> >/59e86bad/attachment-0001.htm
> >
> > ------------------------------
> >
> > Message: 3
> > Date: Tue, 31 Aug 2010 13:39:08 +0100
> > From: Philipp Koehn <[email protected]>
> > Subject: Re: [Moses-support] No weight.ini in example data for EMS
> > To: Sonja PETROVI? LUNDBERG <[email protected]>
> > Cc: [email protected]
> > Message-ID:
> >       
> >
<[email protected]<9v_7AfE5X8x
> >s9bitu%[email protected]>
> >
> > Content-Type: text/plain; charset=UTF-8
> >
> > Hi,
> >
> > ghostview seems to sometimes act funny and not display
> > a just recently written file. In this case, you need to manually
> > type
> >
> >  % gv steps/0/graph.0.ps
> >
> > after running experiment.perl
> >
> > -phi
> >
> > 2010/8/31 Sonja PETROVI? LUNDBERG <[email protected]>:
> > > weight.ini was missing in that directory, but I've found it in
> > > another, almost identical, directory (with Moses decoder stuff). I
> > > have no idea why there were two different EMS directories, and why
> > > there was no weight.ini in the first one.
> > >
> > > Now I experience another problem with the same command "perl
> > > experiment.perl -config config.toy":
> > >
> > > DEFINE STEPS (run with -exec if everything ok)
> > > Warning: locale not supported by Xlib, locale set to C
> > > gv: Cannot open file steps/0/graph.0.ps (Inappropriate file type or
> >
> > format)
> >
> > > The locale warning probably appears because I have OS X, but why the
> > > GhostView problem?
> > >
> > > Thank you,
> > > Sonja
> > >
> > > 2010/8/31 Philipp Koehn <[email protected]>:
> > >> Hi,
> > >>
> > >> the directory
> >
> > ?/Users/so/tools/moses-scripts/scripts-20100806-1525/ems/example/data
> >
> > >> should contain:
> > >>
> > >> nc-5k.en
> > >> nc-5k.fr
> > >> test-ref.en.sgm
> > >> test-src.fr.sgm
> > >> weight.ini
> > >>
> > >> At least that is what is in the SVN directory
> >
> > moses/scripts/ems/example/data.
> >
> > >> -phi
> > >>
> > >> 2010/8/31 Sonja PETROVI? LUNDBERG <[email protected]>:
> > >>> Hi!
> > >>>
> > >>> I am trying to learn how to use EMS on my computer, but already in
> > >>> the testing phase, using config.toy that comes with the installation
> > >>> of Moses, I experience this problem:
> > >>>
> > >>> ikso-ho:test so$ perl experiment.perl -config config.toy
> > >>> STARTING UP AS PROCESS 41208 ON ikso-ho.lan AT Tue Aug 31 11:05:48
> > >>> CEST
> >
> > 2010
> >
> > >>> LOAD CONFIG...
> > >>> find:
> >
> 
>
/Users/so/tools/moses-scripts/scripts-20100806-1525/ems/example/data/weight.
ini*:
> > >>> No such file or directory
> > >>> TUNING:weight-config: file
> >
> >
/Users/so/tools/moses-scripts/scripts-20100806-1525/ems/example/data/weig
> >ht.ini
> >
> > >>> does not exist!
> > >>> Died at experiment.perl line 355.
> > >>>
> > >>> Is weight.ini supposed to be there, or should it be created during
> > >>> the configuration process?
> > >>>
> > >>> Regards,
> > >>> Sonja
> > >>> _______________________________________________
> > >>> Moses-support mailing list
> > >>> [email protected]
> > >>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> > ------------------------------
> >
> > Message: 4
> > Date: Tue, 31 Aug 2010 22:05:49 +0900
> > From: "Lee, Joo-Young" <[email protected]>
> > Subject: [Moses-support] How can I know used translation rules?
> > To: [email protected]
> > Message-ID:
> >       
> >
<[email protected]<Db8A6Lt1T3u
> >pc-wb6uviu2hwlw%[email protected]>
> >
> > Content-Type: text/plain; charset="iso-8859-1"
> >
> > Hi all,
> >
> > I use moses-chart and it works well.
> >
> > But, I want to know and get the translation rules which are used to
> > translate a given source sentence in decoding time.
> >
> > Simply said, I try to find a way to know which translation rules are
> > selected in each ChartCell of moses-chart.
> >
> > Is there any method or API?
> >
> > Best regards.
> >
> > Joo-Young Lee
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> > URL:
> >
http://mailman.mit.edu/mailman/private/moses-support/attachments/20100831
> >/35a5fb00/attachment-0001.htm
> >
> > ------------------------------
> >
> > Message: 5
> > Date: Tue, 31 Aug 2010 15:15:21 +0200
> > From: Sonja PETROVI? LUNDBERG <[email protected]>
> > Subject: Re: [Moses-support] No weight.ini in example data for EMS
> > To: Philipp Koehn <[email protected]>
> > Cc: [email protected]
> > Message-ID:
> >        <[email protected]>
> > Content-Type: text/plain; charset=UTF-8
> >
> > Thanks!
> >
> > Next problem happens in the first step:
> >
> > step TRAINING:prepare-data crashed
> > number of steps doable or running: 0
> >
> > I tried rm steps/1/TRAINING_prepare-data.1* and experiment.perl
> > -continue 1 -exec, but it crashed again, at the same place.
> >
> > Sonja
> >
> > 2010/8/31 Philipp Koehn <[email protected]>:
> > > Hi,
> > >
> > > ghostview seems to sometimes act funny and not display
> > > a just recently written file. In this case, you need to manually
> > > type
> > >
> > > ?% gv steps/0/graph.0.ps
> > >
> > > after running experiment.perl
> > >
> > > -phi
> > >
> > > 2010/8/31 Sonja PETROVI? LUNDBERG <[email protected]>:
> > >> weight.ini was missing in that directory, but I've found it in
> > >> another, almost identical, directory (with Moses decoder stuff). I
> > >> have no idea why there were two different EMS directories, and why
> > >> there was no weight.ini in the first one.
> > >>
> > >> Now I experience another problem with the same command "perl
> > >> experiment.perl -config config.toy":
> > >>
> > >> DEFINE STEPS (run with -exec if everything ok)
> > >> Warning: locale not supported by Xlib, locale set to C
> > >> gv: Cannot open file steps/0/graph.0.ps (Inappropriate file type or
> >
> > format)
> >
> > >> The locale warning probably appears because I have OS X, but why the
> > >> GhostView problem?
> > >>
> > >> Thank you,
> > >> Sonja
> > >>
> > >> 2010/8/31 Philipp Koehn <[email protected]>:
> > >>> Hi,
> > >>>
> > >>> the directory
> >
> > ?/Users/so/tools/moses-scripts/scripts-20100806-1525/ems/example/data
> >
> > >>> should contain:
> > >>>
> > >>> nc-5k.en
> > >>> nc-5k.fr
> > >>> test-ref.en.sgm
> > >>> test-src.fr.sgm
> > >>> weight.ini
> > >>>
> > >>> At least that is what is in the SVN directory
> >
> > moses/scripts/ems/example/data.
> >
> > >>> -phi
> > >>>
> > >>> 2010/8/31 Sonja PETROVI? LUNDBERG <[email protected]>:
> > >>>> Hi!
> > >>>>
> > >>>> I am trying to learn how to use EMS on my computer, but already in
> > >>>> the testing phase, using config.toy that comes with the
installation
> > >>>> of Moses, I experience this problem:
> > >>>>
> > >>>> ikso-ho:test so$ perl experiment.perl -config config.toy
> > >>>> STARTING UP AS PROCESS 41208 ON ikso-ho.lan AT Tue Aug 31 11:05:48
> >
> > CEST 2010
> >
> > >>>> LOAD CONFIG...
> > >>>> find:
> >
> 
>
/Users/so/tools/moses-scripts/scripts-20100806-1525/ems/example/data/weight.
ini*:
> > >>>> No such file or directory
> > >>>> TUNING:weight-config: file
> >
> >
/Users/so/tools/moses-scripts/scripts-20100806-1525/ems/example/data/weig
> >ht.ini
> >
> > >>>> does not exist!
> > >>>> Died at experiment.perl line 355.
> > >>>>
> > >>>> Is weight.ini supposed to be there, or should it be created during
> > >>>> the configuration process?
> > >>>>
> > >>>> Regards,
> > >>>> Sonja
> > >>>> _______________________________________________
> > >>>> Moses-support mailing list
> > >>>> [email protected]
> > >>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> > ------------------------------
> >
> > Message: 6
> > Date: Tue, 31 Aug 2010 06:24:22 -0700
> > From: Francois Masselot <[email protected]>
> > Subject: Re: [Moses-support] Train Moses Engine for EN to ZH_CN
> > To: "[email protected]" <[email protected]>
> > Message-ID:
> >        <
> >
0A83ECC8B2B3F342A24B0483F48FEAD52B22A44A67@ADSK-NAMSG-01.MGDADSK.autodesk
> >.com
> >
> >
> > Content-Type: text/plain; charset="iso-8859-1"
> >
> >
> > Dear Wenlong,
> >
> > The Moses toolkit is language independent, so there shouldn't be
anything
> > special to do. The one thing to take care of is to tokenize properly the
> > Chinese training corpus. Moses takes as input sentences where words
> > (tokens) are space-separated, and usually in Chinese texts, words are
not
> > separated by spaces. There's nothing else special: I created recently an
> > English-Chinese and Chinese-English Moses engines and training and
> > decoding work just fine.
> > For decoding, you just need to tokenize and detokenize accordingly, i.e.
> > tokenize Chinese source sentences, and remove spaces between Chinese
> > words when Chinese is the target language.
> >
> > Regards
> > Fran?ois
> >
> >
> >
> >
> > ------------------------------
> >
> > Message: 7
> > Date: Tue, 31 Aug 2010 06:27:20 -0700
> > From: Francois Masselot <[email protected]>
> > Subject: Re: [Moses-support] Train Moses Engine for EN to ZH_CN
> > To: "[email protected]" <[email protected]>
> > Message-ID:
> >        <
> >
0A83ECC8B2B3F342A24B0483F48FEAD52B22A44A73@ADSK-NAMSG-01.MGDADSK.autodesk
> >.com
> >
> >
> > Content-Type: text/plain; charset="iso-8859-1"
> >
> > Dear Wenlong,
> >
> > The Moses toolkit is language independent, so there shouldn't be
anything
> > special to do. The one thing to take care of is to tokenize properly the
> > Chinese training corpus. Moses takes as input sentences where words
> > (tokens) are space-separated, and usually in Chinese texts, words are
not
> > separated by spaces. There's nothing else special: I created recently an
> > English-Chinese and Chinese-English Moses engines and training and
> > decoding work just fine.
> > For decoding, you just need to tokenize and detokenize accordingly, i.e.
> > tokenize Chinese source sentences, and remove spaces between Chinese
> > words when Chinese is the target language.
> >
> > Regards
> > Fran?ois
> >
> >
> >
> >
> >
> > ------------------------------
> >
> > _______________________________________________
> > Moses-support mailing list
> > [email protected]
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
> > End of Moses-support Digest, Vol 46, Issue 42
> > *********************************************

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to