Dear Moses team and users, I am using Moses to translate from an imaginary language "French" to English, and was hoping I could get some comments on my current setup.
Does the following use of Moses sound reasonable to anybody? I have posted it below as a commented Makefile excerpt. Note that it is based on the tutorials: http://www.statmt.org/moses/?n=FactoredTraining.HomePage http://www.statmt.org/moses/?n=Moses.Tutorial software -------- - GIZA++ 1.0.2 (compiled /without/ the -DBINARY_SEARCH_FOR_TTABLE flag) - SRILM (standard) - moses 2008-7-11 (standard) usage ----- My corpus consists of two text files, foo/train-corpus.en foo/train-corpus.fr Each line in the file consists of a sentence in the respective language, with (for example) the sentence in line 3 of the English file corresponding to the sentence in line 3 of the "French" file. > %/m-corpus.en %/m-corpus.fr : %/train-corpus.en %/train-corpus.fr > cd $(<D) ; $(MOSES_SCRIPTS)/training/clean-corpus-n.perl train-corpus > en fr m-corpus 1 100 Before using my corpus directly, I clean it up with the clean-corpus script, which produces the files foo/m-corpus.en and foo/m-corpus.fr > %.lm : % > $(SRILM_BINDIR)/ngram-count -text $< -lm $@ From foo/m-corpus.lm, I train a language model using SRILM's ngram-count with the options -text. I assume these are reasonable options to pass to SRILM. > %/model/moses.ini: %/m-corpus.en.lm > cd $(<D); $(MOSES_SCRIPTS)/training/train-factored-phrase-model.perl\ > --root-dir .\ > --corpus $(basename $(basename $(<F)))\ > --f fr --e en --lm 0:3:$(<F):0 Armed with an English language model, I use the script train-factored-phrase-model.perl I am using an unfactored language model for simplicity. This produces foo/model/moses.ini, among other files in foo/model, notably foo/model/phrase-table.0-0.gz. > %/test.results: %/test-corpus.fr %/test-corpus.en %/model/moses.ini > cd $(<D); moses -f model/moses.ini < $(<F) > $(@F) Finally, some translation. I call Moses on the file foo/model/moses.ini and I produce foo/test.results which looks a bit like English indeed. Any thoughts? Thanks! -- Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow> PGP Key ID: 08AC04F9
signature.asc
Description: Digital signature
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
