Dear Moses,
Where does this data come from?
http://www.statmt.org/wmt13/training-monolingual-europarl-v7.tgz
Specifically, if I wanted non-WMT languages, then I can download
Europarl from http://www.statmt.org/europarl/ .
There are some tools, like a perl script to strip XML, but that also
strips out <P> tags which are meant to be preserved for
split-sentences.perl. And I don't think split-sentences.perl was
designed to run before stripping XML but could be wrong.
Does one write a custom XML strip program to remove all the tags except
<P> then pass it to split-sentences.perl?
Kenneth
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support