Hi Raphael,
On Tuesday, May 04, 2010, at 01:01PM, "Raphael Payen" <[email protected]> wrote: >... >If I want to train >my own model, I must provide a syntactically annotated parallel >corpus. So, if I start from just a parallel corpus, I'll need to use >for example first a POS tagger, then a Collins parser, then the >wrapper script provided, and then call train-model.perl with >--{source,target}-syntax ? There are a couple of wrapper scripts, one for English that uses the Collins parser and one for German that uses Bitpar. They take tokenized input and perform both steps of calling the parser and then converting the output to Moses' XML format (or producing a blank line if parsing fails). If you need to use another parser, you'll have to write your own wrapper (please contribute a copy if you do). >I tried with a dummy corpus containing just this: ><tree label="PN"> das </tree> <tree label="V"> ist </tree> <tree >label="NP"> <tree label="DET"> ein </tree> <tree label="ADJ"> kleines ></tree> <tree label="NN"> haus </tree> </tree> >(and similar in english) You'll need a top level constituent. For example, <tree label="S"> <tree label="PN"> das </tree> ... </tree> I'm not sure what'll happen otherwise, but probably not what you want... >I called train-model.perl like this: >train-model.perl --corpus testfile -f de -e en -lm >0:3:europarl.srilm.gz --source-syntax --target-syntax >and got this error: >mkcls: StatVar.cpp:116: double StatVar::quantil(double): Assertion >`index>=0&&index<n' failed >Obviously there's something I'm doing wrong, but I don't know what. Hmmm, not sure what's going on there. The source-syntax and target-syntax options should cause train-model.perl to create a de-XMLed temporary file and supply that to mkcls. Perhaps it's a knock-on effect from the incomplete XML corpus(?) >By the way, train-model.perl is only in branches/mt3_chart, not in trunk ? The script was called train-factored-phrase-model.perl in trunk, but it's just been renamed so you should find train-model.perl in the latest revision. They should both work, but I'd recommend giving the latest trunk revision a try nevertheless. Phil _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
