Hello, I'm using Moses to do some SMT on Arabic, experimenting with diacritized vs. undiacritized Arabic training corpora. (I am using MADA+TOKAN to perform automatic diacritization.) So, if anyone happens to be specifically interested in Arabic, has some tips on using Moses for Arabic (right now I am just trying to get a baseline system running, so I haven't even begun exploring which parameters I need to tweak from the defaults), or can give me any other insights, I'd be very pleased to talk to you about it off-list; please email me.
Now, I have a specific question and a specific problem, to which I have not found a solution by searching the archives. 1. There are two scripts referenced in scripts/released-files (read by the scripts Makefile): training/train-factored-phrase-model.perl training/filter-and-binarize-model-given-input.pl These scripts do not exist in the most recent SVN release so 'make release' reports an error since obviously it cannot install them. The tutorials alternately reference train-factored-phrase-model.perl and train-model.perl; reading the latter, it seems to do factored training. Is this just an error (and something that should be updated in the online docs and released-files), and I should only be using train-model.perl? Or is there a difference between the two scripts? And is the same true of training/filter-and-binarize-model-given-input.pl vs. filter-model-given-input.pl? 2. I went through the entire tutorial using the French-English Europarl data sets, and got it working. Now I'm going through the same process with my Arabic-English parallel corpora. I've gotten as far as tuning. I've been trying to use train-model.perl, and it gets to this part: "<my-moses-dir>/moses-cmd/src/moses -v 0 -config <my-model-dir>/moses.ini -inputtype 0 -w 0.000000 -lm 0.333333 -d 0.333333 -tm 0.100000 0.066667 0.100000 0.066667 0.000000 -n-best-list run1.best100.out 100 -i <my-arabic-input-file> > run1.out It generates run1.best100.out and run1.out, but then chokes with this error message: Translation took 0.060 seconds Finished translating [ERROR] Malformed input at Expected input to have words composed of 1 factor(s) (form FAC1|FAC2|...) but instead received input with 2 factor(s). Aborted So I gather somewhere I have a setting wrong, but I cannot figure out where it is. I basically followed the exact same steps with my Arabic-English corpora as in the tutorial, just substituting my own training data. I'm not trying to do factored training at this time. Any advice appreciated. Thanks! _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
