Hi Tom,
I've been using this one for the Edinburgh WMT submission (EN-DE
syntax-based) in the last 3 years:
https://github.com/rsennrich/wmt2014-scripts/blob/master/hybrid_compound_splitter.py
It implements the hybrid (frequency-based and FST-based) algorithm by
Fritzinger & Fraser 2010: "How to Avoid Burning Ducks: Combining
Linguistic Analysis and Corpus Statistics for German Compound Processing"
best wishes,
Rico
On 24.08.2016 17:10, Tom Hoar wrote:
Does anyone recommend a German compound splitter? I know it's been
discussed here before. Thanks.
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support