Hi Raphael,

On Tuesday, May 04, 2010, at 01:01PM, "Raphael Payen" <[email protected]> 
wrote:
>...
>If I want to train
>my own model, I must provide a syntactically annotated parallel
>corpus. So, if I start from just a parallel corpus, I'll need to use
>for example first a POS tagger, then a Collins parser, then the
>wrapper script provided, and then call train-model.perl with
>--{source,target}-syntax ?

There are a couple of wrapper scripts, one for English that uses
the Collins parser and one for German that uses Bitpar.  They
take tokenized input and perform both steps of calling the parser
and then converting the output to Moses' XML format (or producing
a blank line if parsing fails).  If you need to use another parser,
you'll have to write your own wrapper (please contribute a copy if
you do).


>I tried with a dummy corpus containing just this:
><tree label="PN"> das </tree> <tree label="V"> ist </tree> <tree
>label="NP"> <tree label="DET"> ein </tree> <tree label="ADJ"> kleines
></tree> <tree label="NN"> haus </tree> </tree>
>(and similar in english)

You'll need a top level constituent.  For example,

    <tree label="S"> <tree label="PN"> das </tree> ... </tree>

I'm not sure what'll happen otherwise, but probably not what
you want...


>I called train-model.perl like this:
>train-model.perl --corpus testfile -f de -e en -lm
>0:3:europarl.srilm.gz --source-syntax --target-syntax
>and got this error:
>mkcls: StatVar.cpp:116: double StatVar::quantil(double): Assertion
>`index>=0&&index<n' failed
>Obviously there's something I'm doing wrong, but I don't know what.

Hmmm, not sure what's going on there.  The source-syntax and
target-syntax options should cause train-model.perl to create
a de-XMLed temporary file and supply that to mkcls.  Perhaps
it's a knock-on effect from the incomplete XML corpus(?)


>By the way, train-model.perl is only in branches/mt3_chart, not in trunk ?

The script was called train-factored-phrase-model.perl in trunk, 
but it's just been renamed so you should find train-model.perl in
the latest revision.  They should both work, but I'd recommend
giving the latest trunk revision a try nevertheless.


Phil
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to