Hello,

I'm using Moses to do some SMT on Arabic, experimenting with
diacritized vs. undiacritized Arabic training corpora. (I am using
MADA+TOKAN to perform automatic diacritization.) So, if anyone happens
to be specifically interested in Arabic, has some tips on using Moses
for Arabic (right now I am just trying to get a baseline system
running, so I haven't even begun exploring which parameters I need to
tweak from the defaults), or can give me any other insights, I'd be
very pleased to talk to you about it off-list; please email me.

Now, I have a specific question and a specific problem, to which I
have not found a solution by searching the archives.

1. There are two scripts referenced in scripts/released-files (read by
the scripts Makefile):
   training/train-factored-phrase-model.perl
   training/filter-and-binarize-model-given-input.pl

These scripts do not exist in the most recent SVN release so 'make
release' reports an error since obviously it cannot install them.

The tutorials alternately reference train-factored-phrase-model.perl
and train-model.perl; reading the latter, it seems to do factored
training. Is this just an error (and something that should be updated
in the online docs and released-files), and I should only be using
train-model.perl? Or is there a difference between the two scripts?
And is the same true of
training/filter-and-binarize-model-given-input.pl vs.
filter-model-given-input.pl?

2. I went through the entire tutorial using the French-English
Europarl data sets, and got it working. Now I'm going through the same
process with my Arabic-English parallel corpora. I've gotten as far as
tuning. I've been trying to use train-model.perl, and it gets to this
part:

"<my-moses-dir>/moses-cmd/src/moses -v 0 -config
<my-model-dir>/moses.ini -inputtype 0 -w 0.000000 -lm 0.333333 -d
0.333333 -tm 0.100000 0.066667 0.100000 0.066667 0.000000
-n-best-list run1.best100.out 100 -i <my-arabic-input-file> > run1.out

It generates run1.best100.out and run1.out, but then chokes with this
error message:

Translation took 0.060 seconds
Finished translating
[ERROR] Malformed input at
  Expected input to have words composed of 1 factor(s) (form FAC1|FAC2|...)
  but instead received input with 2 factor(s).
Aborted

So I gather somewhere I have a setting wrong, but I cannot figure out
where it is. I basically followed the exact same steps with my
Arabic-English corpora as in the tutorial, just substituting my own
training data. I'm not trying to do factored training at this time.

Any advice appreciated. Thanks!
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to