I invite the Moses community to review the new release of Corpus
Filtergraph version 3.1.131. The release includes significant updates
including 29 plugins, four graph examples, and example data to run scripts.
This version obsoletes previous plugins.
Plugins:
1 reader class plugin to read input files
1 writer class plugin to write output files
22 language-agnostic filter class plugins
2 language-specific filter class plugins (for Japanese
)
3 aligner classes plugins to process source and target
files in parallel
Graphs:
$./lib/corpus.lm Shows how to process monolingual language
model data
$./lib/corpus.europarl5 Shows how to clean and filter parallel
corpus
from sentence-aligned data
$./lib/build.europarl5 Shows how to build Moses training data. Sets
include 10 files: 2 parallel corpus files,
2 mert tuning files, 2 eval text files and
4 mteval .xml files for bi-directional
testing
(note: requires Moses Decoder toolkit
installed
on the host machine because one plugin uses
tokenizer.perl. Does not run on MS Windows)
https://sourceforge.net/projects/corpfiltergraph/
Best regards,
Tom
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support