Re: [Moses-support] A few MOSES questions (Arabic, missing scripts, Moses error)

Miles Osborne Fri, 07 May 2010 15:37:57 -0700

MADA can create tokens that are bar characters (ie | )

you need to rename them to something like BAR.  Moses treats these as
factor delimiters, hence the message you are seeing


(i've been using MADA+TOKAN for Arabic, using the D2 setting)

Miles

On 7 May 2010 23:26, David Edelstein <[email protected]> wrote:
> Hello,
>
> I'm using Moses to do some SMT on Arabic, experimenting with
> diacritized vs. undiacritized Arabic training corpora. (I am using
> MADA+TOKAN to perform automatic diacritization.) So, if anyone happens
> to be specifically interested in Arabic, has some tips on using Moses
> for Arabic (right now I am just trying to get a baseline system
> running, so I haven't even begun exploring which parameters I need to
> tweak from the defaults), or can give me any other insights, I'd be
> very pleased to talk to you about it off-list; please email me.
>
> Now, I have a specific question and a specific problem, to which I
> have not found a solution by searching the archives.
>
> 1. There are two scripts referenced in scripts/released-files (read by
> the scripts Makefile):
>   training/train-factored-phrase-model.perl
>   training/filter-and-binarize-model-given-input.pl
>
> These scripts do not exist in the most recent SVN release so 'make
> release' reports an error since obviously it cannot install them.
>
> The tutorials alternately reference train-factored-phrase-model.perl
> and train-model.perl; reading the latter, it seems to do factored
> training. Is this just an error (and something that should be updated
> in the online docs and released-files), and I should only be using
> train-model.perl? Or is there a difference between the two scripts?
> And is the same true of
> training/filter-and-binarize-model-given-input.pl vs.
> filter-model-given-input.pl?
>
> 2. I went through the entire tutorial using the French-English
> Europarl data sets, and got it working. Now I'm going through the same
> process with my Arabic-English parallel corpora. I've gotten as far as
> tuning. I've been trying to use train-model.perl, and it gets to this
> part:
>
> "<my-moses-dir>/moses-cmd/src/moses -v 0 -config
> <my-model-dir>/moses.ini -inputtype 0 -w 0.000000 -lm 0.333333 -d
> 0.333333 -tm 0.100000 0.066667 0.100000 0.066667 0.000000
> -n-best-list run1.best100.out 100 -i <my-arabic-input-file> > run1.out
>
> It generates run1.best100.out and run1.out, but then chokes with this
> error message:
>
> Translation took 0.060 seconds
> Finished translating
> [ERROR] Malformed input at
>  Expected input to have words composed of 1 factor(s) (form FAC1|FAC2|...)
>  but instead received input with 2 factor(s).
> Aborted
>
> So I gather somewhere I have a setting wrong, but I cannot figure out
> where it is. I basically followed the exact same steps with my
> Arabic-English corpora as in the tutorial, just substituting my own
> training data. I'm not trying to do factored training at this time.
>
> Any advice appreciated. Thanks!
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] A few MOSES questions (Arabic, missing scripts, Moses error)

Reply via email to