Your training process is fine for a baseline. The only thing missing is the 
tuning process. The values you'll find in moses.ini are not tuned for optimal 
results and usually a development corpus is used for such a task. Some of that 
information is found in 

http://www.statmt.org/moses/?n=FactoredTraining.Tuning

The tuning process is really important. It makes a big improvement in your 
translation results so you should do it always. Those values are weights for 
the different models and wrong values or random values will not give you as 
good results as tuned ones.

The rest of your steps are fine. Language model, training and translation. It's 
a nice start.

 --
Carlos A. HenrĂ­quez Q.
+34-693-278-219
[EMAIL PROTECTED]
[EMAIL PROTECTED]



----- Mensaje original ----
De: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Para: [email protected]
Enviado: martes, 2 de septiembre, 2008 16:20:33
Asunto: [Moses-support] is this a reasonable moses setup?

Dear Moses team and users,

I am using Moses to translate from an imaginary language "French" to
English, and was hoping I could get some comments on my current setup.

Does the following use of Moses sound reasonable to anybody?  I have
posted it below as a commented Makefile excerpt.  Note that it is
based on the tutorials:

  http://www.statmt.org/moses/?n=FactoredTraining.HomePage
  http://www.statmt.org/moses/?n=Moses.Tutorial

software
--------
- GIZA++ 1.0.2
   (compiled /without/ the -DBINARY_SEARCH_FOR_TTABLE flag)
- SRILM
   (standard)
- moses 2008-7-11
   (standard)

usage
-----
My corpus consists of two text files,
foo/train-corpus.en
foo/train-corpus.fr

Each line in the file consists of a sentence in the respective language,
with (for example) the sentence in line 3 of the English file
corresponding to the sentence in line 3 of the "French" file.

> %/m-corpus.en %/m-corpus.fr : %/train-corpus.en %/train-corpus.fr
>         cd $(<D) ; $(MOSES_SCRIPTS)/training/clean-corpus-n.perl train-corpus 
> en fr m-corpus 1 100

Before using my corpus directly, I clean it up with the clean-corpus
script, which produces the files foo/m-corpus.en and foo/m-corpus.fr

> %.lm : %
>         $(SRILM_BINDIR)/ngram-count -text $< -lm $@

From foo/m-corpus.lm, I train a language model using SRILM's ngram-count
with the options -text.  I assume these are reasonable options to pass
to SRILM.

> %/model/moses.ini: %/m-corpus.en.lm
>         cd $(<D); $(MOSES_SCRIPTS)/training/train-factored-phrase-model.perl\
>           --root-dir .\
>           --corpus $(basename $(basename $(<F)))\
>           --f fr --e en --lm 0:3:$(<F):0

Armed with an English language model, I use the script
  train-factored-phrase-model.perl
I am using an unfactored language model for simplicity.

This produces foo/model/moses.ini, among other files in foo/model,
notably foo/model/phrase-table.0-0.gz.

> %/test.results: %/test-corpus.fr %/test-corpus.en %/model/moses.ini
>         cd $(<D); moses -f model/moses.ini < $(<F) > $(@F)

Finally, some translation.  I call Moses on the file foo/model/moses.ini
and I produce foo/test.results which looks a bit like English indeed.

Any thoughts?

Thanks!

-- 
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
PGP Key ID: 08AC04F9



      
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to