答复: [Moses-support] about Morph tagging

2010-10-22 Thread JiaHongwei
Thank you very much!
BTW, I’m studying Morphisto now, which is a morphological analyzer for
German.
http://code.google.com/p/morphisto/
And maybe I will use relevant HFST's tools as morphological analyzer for
other languages.
Best Regards
Henry
-邮件原件-
发件人: Francis Tyers [mailto:fty...@prompsit.com] 
发送时间: 2010年10月20日 18:13
收件人: JiaHongwei
抄送: moses-support@mit.edu
主题: Re: [Moses-support] about Morph tagging

You could use the morphological analysers from the Apertium project.

http://wiki.apertium.org/wiki/Using_an_lttoolbox_dictionary
http://wiki.apertium.org/wiki/Lttoolbox
http://wiki.apertium.org/wiki/HFST

Fran

El dc 20 de 10 de 2010 a les 17:58 +0800, en/na JiaHongwei va escriure:
 Hi,
 
 I need to train a model with POS tags and morphological
 information for Moses involving languages such as German, Spanish,
 French and Italian.
 
 By using TreeTagger, I can get POS tags in the format 'form pos
 lemma'. 
 
 But I want it further processed to be like this, such as 'form
 pos lemma morph'.
 
 So the job is taking 'form pos lemma' as input and output in
 format 'form pos lemma morph'.
 
 Could you recommend a way or a tool to help me do this job
 automatically or in pipeline?
 
 Thanks in advance!
 
  
 
 Best Regards
 
 Henry
 
 
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support




___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: 答复: [Moses-support] about Morph tagging

2010-10-22 Thread Francis Tyers
Just so you know, you can compile SFST transducers with HFST, in case
you don't want to install many different tools :)

Fran

El dv 22 de 10 de 2010 a les 15:49 +0800, en/na JiaHongwei va escriure:
 Thank you very much!
 BTW, I’m studying Morphisto now, which is a morphological analyzer for
 German.
 http://code.google.com/p/morphisto/
 And maybe I will use relevant HFST's tools as morphological analyzer for
 other languages.
 Best Regards
 Henry
 -邮件原件-
 发件人: Francis Tyers [mailto:fty...@prompsit.com] 
 发送时间: 2010年10月20日 18:13
 收件人: JiaHongwei
 抄送: moses-support@mit.edu
 主题: Re: [Moses-support] about Morph tagging
 
 You could use the morphological analysers from the Apertium project.
 
 http://wiki.apertium.org/wiki/Using_an_lttoolbox_dictionary
 http://wiki.apertium.org/wiki/Lttoolbox
 http://wiki.apertium.org/wiki/HFST
 
 Fran
 
 El dc 20 de 10 de 2010 a les 17:58 +0800, en/na JiaHongwei va escriure:
  Hi,
  
  I need to train a model with POS tags and morphological
  information for Moses involving languages such as German, Spanish,
  French and Italian.
  
  By using TreeTagger, I can get POS tags in the format 'form pos
  lemma'. 
  
  But I want it further processed to be like this, such as 'form
  pos lemma morph'.
  
  So the job is taking 'form pos lemma' as input and output in
  format 'form pos lemma morph'.
  
  Could you recommend a way or a tool to help me do this job
  automatically or in pipeline?
  
  Thanks in advance!
  
   
  
  Best Regards
  
  Henry
  
  
  ___
  Moses-support mailing list
  Moses-support@mit.edu
  http://mailman.mit.edu/mailman/listinfo/moses-support
 
 
 


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Word alignment information in binary phrase table

2010-10-22 Thread Hieu Hoang

thanks christof,

i think a lot of people will find this feature very useful. I've checked 
it in
   
http://mosesdecoder.svn.sourceforge.net/viewvc/mosesdecoder?view=revisionrevision=3637


On 22/10/2010 00:01, Christof Pintaske wrote:

 Hi,

train-model.perl with the parameter -phrase-word-alignment adds 
word-for-word alignment information to the phrase table. Unfortunately 
this information get's lost when converting the textual phrase-table 
into a binary format with processPhraseTable. Using 
processPhraseTable -alignment-info was meant to store the alignment 
information in the binary table as well. This functionality is broken 
since the format for the word alignment information changed and 
currently no word alignment information is stored in the binary phrase 
tables. Being required to use the textual file limits the size of the 
phrase-table in respect to the memory on the server.


The attached patch provides the missing changes. It stores new-style 
alignment information with the target candidates in the 
phrase-table.binphr.tgtdata.wa file and reads them out 
correspondingly (It doesn't split the alignment information into 
source and target alignment as in the old implementation/format. It 
keeps it in a format supported by 
TargetPhrase::SetAlignmentInfo(std::string)).


I tested the change with valgrind for both moses and 
processPhraseTable in a smaller moses translation system without any 
complaints. And both the translation and the alignment file that gets 
produced with moses -use-alignment-info -print-alignment-info -T 
File are identical, regardless of text or binary phrase-table. The 
patch should not change the behavior for phrase-tables without 
word-alignment.


I hope you find the patch useful and hopefully it can be committed to 
repo. Of course, please let me know if any modifications are necessary 
or desirable.


best regards
Christof


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] KenLM distributed with Moses

2010-10-22 Thread support
Thanks Ken. Nice work. 

Is there a way to train the ARPA formatted LM with KenLM, or do we need to
train with another tool, like SRILM or convert IRSTLM to full ARPA format?

Thanks again,
Tom



On Mon, 18 Oct 2010 20:31:38 -0400, Kenneth Heafield mo...@kheafield.com
wrote:
 Hi Moses,
 
   Introducing kenlm in Moses trunk.  You no longer need to download a
 separate language model to use Moses; it's distributed with Moses and
 compiled in by default on UNIX.  This is threadsafe language model
 inference code that returns the same probabilities as SRI (up to
 floating point rounding).  It loads APRA files in 2/3 the time SRI takes
 and uses less memory too.  Using kenlm is simple: in your [lmodel-file]
 section, change the first digit to 8.  For example,
 
 0 0 2 foo.arpa changes to 8 0 2 foo.arpa
 
   For even faster loading, use the binary format:
 
 kenlm/build_binary foo.arpa foo.binary
 
 then simply provide the binary filename in your moses.ini e.g.
 8 0 2 foo.binary; it auto detects binary files using magic bytes at
 the beginning.
 
   The code is ready for use and provides correct results.  Inference is
 slower than it should be due to inefficiencies in the Moses-side wrapper
 code (it does a vocab lookup for all 5 words every time).  I'm working
 on it and once this is done I'll post some benchmarks against SRI and
 IRST. The binary format is subject to change, but contains a version
 number so on very rare occasions after, new versions will tell you to
 rebuild your binary files.  Windows is currently not supported (it uses
 mmap) though I welcome contributions using #ifdef and CreateFileMapping.
 
   Have fun and let me know about your experiences with it.
 
 Ken
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] KenLM distributed with Moses

2010-10-22 Thread Kenneth Heafield
KenLM is inference-only.  It cannot create ARPA files.  So you'll need
to use your favorite toolkit to generate the ARPA.

On 10/22/10 07:52, supp...@precisiontranslationtools.com wrote:
 Thanks Ken. Nice work. 
 
 Is there a way to train the ARPA formatted LM with KenLM, or do we need to
 train with another tool, like SRILM or convert IRSTLM to full ARPA format?
 
 Thanks again,
 Tom
 
 
 
 On Mon, 18 Oct 2010 20:31:38 -0400, Kenneth Heafield mo...@kheafield.com
 wrote:
 Hi Moses,

  Introducing kenlm in Moses trunk.  You no longer need to download a
 separate language model to use Moses; it's distributed with Moses and
 compiled in by default on UNIX.  This is threadsafe language model
 inference code that returns the same probabilities as SRI (up to
 floating point rounding).  It loads APRA files in 2/3 the time SRI takes
 and uses less memory too.  Using kenlm is simple: in your [lmodel-file]
 section, change the first digit to 8.  For example,

 0 0 2 foo.arpa changes to 8 0 2 foo.arpa

  For even faster loading, use the binary format:

 kenlm/build_binary foo.arpa foo.binary

 then simply provide the binary filename in your moses.ini e.g.
 8 0 2 foo.binary; it auto detects binary files using magic bytes at
 the beginning.

  The code is ready for use and provides correct results.  Inference is
 slower than it should be due to inefficiencies in the Moses-side wrapper
 code (it does a vocab lookup for all 5 words every time).  I'm working
 on it and once this is done I'll post some benchmarks against SRI and
 IRST. The binary format is subject to change, but contains a version
 number so on very rare occasions after, new versions will tell you to
 rebuild your binary files.  Windows is currently not supported (it uses
 mmap) though I welcome contributions using #ifdef and CreateFileMapping.

  Have fun and let me know about your experiences with it.

 Ken
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] KenLM distributed with Moses

2010-10-22 Thread support
Thanks, Ken.

Tom

On Fri, 22 Oct 2010 10:15:21 -0400, Kenneth Heafield mo...@kheafield.com
wrote:
 KenLM is inference-only.  It cannot create ARPA files.  So you'll need
 to use your favorite toolkit to generate the ARPA.
 
 On 10/22/10 07:52, supp...@precisiontranslationtools.com wrote:
 Thanks Ken. Nice work. 
 
 Is there a way to train the ARPA formatted LM with KenLM, or do we need
 to
 train with another tool, like SRILM or convert IRSTLM to full ARPA
 format?
 
 Thanks again,
 Tom
 
 
 
 On Mon, 18 Oct 2010 20:31:38 -0400, Kenneth Heafield
 mo...@kheafield.com
 wrote:
 Hi Moses,

 Introducing kenlm in Moses trunk.  You no longer need to download a
 separate language model to use Moses; it's distributed with Moses and
 compiled in by default on UNIX.  This is threadsafe language model
 inference code that returns the same probabilities as SRI (up to
 floating point rounding).  It loads APRA files in 2/3 the time SRI
takes
 and uses less memory too.  Using kenlm is simple: in your
[lmodel-file]
 section, change the first digit to 8.  For example,

 0 0 2 foo.arpa changes to 8 0 2 foo.arpa

 For even faster loading, use the binary format:

 kenlm/build_binary foo.arpa foo.binary

 then simply provide the binary filename in your moses.ini e.g.
 8 0 2 foo.binary; it auto detects binary files using magic bytes at
 the beginning.

 The code is ready for use and provides correct results.  Inference is
 slower than it should be due to inefficiencies in the Moses-side
wrapper
 code (it does a vocab lookup for all 5 words every time).  I'm working
 on it and once this is done I'll post some benchmarks against SRI and
 IRST. The binary format is subject to change, but contains a version
 number so on very rare occasions after, new versions will tell you to
 rebuild your binary files.  Windows is currently not supported (it
uses
 mmap) though I welcome contributions using #ifdef and
CreateFileMapping.

 Have fun and let me know about your experiences with it.

 Ken
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support