Dear Moses,

        Where does this data come from?

http://www.statmt.org/wmt13/training-monolingual-europarl-v7.tgz

Specifically, if I wanted non-WMT languages, then I can download
Europarl from http://www.statmt.org/europarl/ .

        There are some tools, like a perl script to strip XML, but that also
strips out <P> tags which are meant to be preserved for
split-sentences.perl.  And I don't think split-sentences.perl was
designed to run before stripping XML but could be wrong.

        Does one write a custom XML strip program to remove all the tags except
<P> then pass it to split-sentences.perl?

Kenneth
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to