Thank you, Rico! Looks promising.
I found this one on Python's Pypi repository:
https://pypi.python.org/pypi/SoMaJo/1.1.2
Does anyone have any experience with it?
Tom
On 8/25/2016 11:01 PM, [email protected] wrote:
Date: Wed, 24 Aug 2016 17:23:22 +0100
From: Rico Sennrich<[email protected]>
Subject: Re: [Moses-support] German compound splitter
To:[email protected]
Message-ID:<[email protected]>
Content-Type: text/plain; charset="windows-1252"
Hi Tom,
I've been using this one for the Edinburgh WMT submission (EN-DE
syntax-based) in the last 3 years:
https://github.com/rsennrich/wmt2014-scripts/blob/master/hybrid_compound_splitter.py
It implements the hybrid (frequency-based and FST-based) algorithm by
Fritzinger & Fraser 2010: "How to Avoid Burning Ducks: Combining
Linguistic Analysis and Corpus Statistics for German Compound Processing"
best wishes,
Rico
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support