Your training process is fine for a baseline. The only thing missing is the tuning process. The values you'll find in moses.ini are not tuned for optimal results and usually a development corpus is used for such a task. Some of that information is found in
http://www.statmt.org/moses/?n=FactoredTraining.Tuning The tuning process is really important. It makes a big improvement in your translation results so you should do it always. Those values are weights for the different models and wrong values or random values will not give you as good results as tuned ones. The rest of your steps are fine. Language model, training and translation. It's a nice start. -- Carlos A. HenrĂquez Q. +34-693-278-219 [EMAIL PROTECTED] [EMAIL PROTECTED] ----- Mensaje original ---- De: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Para: [email protected] Enviado: martes, 2 de septiembre, 2008 16:20:33 Asunto: [Moses-support] is this a reasonable moses setup? Dear Moses team and users, I am using Moses to translate from an imaginary language "French" to English, and was hoping I could get some comments on my current setup. Does the following use of Moses sound reasonable to anybody? I have posted it below as a commented Makefile excerpt. Note that it is based on the tutorials: http://www.statmt.org/moses/?n=FactoredTraining.HomePage http://www.statmt.org/moses/?n=Moses.Tutorial software -------- - GIZA++ 1.0.2 (compiled /without/ the -DBINARY_SEARCH_FOR_TTABLE flag) - SRILM (standard) - moses 2008-7-11 (standard) usage ----- My corpus consists of two text files, foo/train-corpus.en foo/train-corpus.fr Each line in the file consists of a sentence in the respective language, with (for example) the sentence in line 3 of the English file corresponding to the sentence in line 3 of the "French" file. > %/m-corpus.en %/m-corpus.fr : %/train-corpus.en %/train-corpus.fr > cd $(<D) ; $(MOSES_SCRIPTS)/training/clean-corpus-n.perl train-corpus > en fr m-corpus 1 100 Before using my corpus directly, I clean it up with the clean-corpus script, which produces the files foo/m-corpus.en and foo/m-corpus.fr > %.lm : % > $(SRILM_BINDIR)/ngram-count -text $< -lm $@ From foo/m-corpus.lm, I train a language model using SRILM's ngram-count with the options -text. I assume these are reasonable options to pass to SRILM. > %/model/moses.ini: %/m-corpus.en.lm > cd $(<D); $(MOSES_SCRIPTS)/training/train-factored-phrase-model.perl\ > --root-dir .\ > --corpus $(basename $(basename $(<F)))\ > --f fr --e en --lm 0:3:$(<F):0 Armed with an English language model, I use the script train-factored-phrase-model.perl I am using an unfactored language model for simplicity. This produces foo/model/moses.ini, among other files in foo/model, notably foo/model/phrase-table.0-0.gz. > %/test.results: %/test-corpus.fr %/test-corpus.en %/model/moses.ini > cd $(<D); moses -f model/moses.ini < $(<F) > $(@F) Finally, some translation. I call Moses on the file foo/model/moses.ini and I produce foo/test.results which looks a bit like English indeed. Any thoughts? Thanks! -- Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow> PGP Key ID: 08AC04F9
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
