Robert Muir created LUCENE-10098:
------------------------------------
Summary: Add note/link to GermanAnalyzer for decompounding nouns
Key: LUCENE-10098
URL: https://issues.apache.org/jira/browse/LUCENE-10098
Project: Lucene - Core
Issue Type: Task
Reporter: Robert Muir
The GermanAnalyzer doesn't split compound nouns.
Doing this requires some auxiliary data files with strange licenses. But
[~uschindler] has documented and packaged everything up to make this easy:
https://github.com/uschindler/german-decompounder
We added a Lucene API example (using CustomAnalyzer) to the README:
https://github.com/uschindler/german-decompounder/pull/6
So I think it would be nice to link to this from the javadocs, it makes it
really easy to download the datafiles and configure an appropriate analyzer, if
you are OK with Latex and LGPL licenses for the data files (which many folks
might be).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]