[
https://issues.apache.org/jira/browse/LUCENE-10098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir resolved LUCENE-10098.
----------------------------------
Fix Version/s: 8.11
main (9.0)
Resolution: Fixed
> Add note/link to GermanAnalyzer for decompounding nouns
> -------------------------------------------------------
>
> Key: LUCENE-10098
> URL: https://issues.apache.org/jira/browse/LUCENE-10098
> Project: Lucene - Core
> Issue Type: Task
> Reporter: Robert Muir
> Priority: Major
> Fix For: main (9.0), 8.11
>
> Time Spent: 1h
> Remaining Estimate: 0h
>
> The GermanAnalyzer doesn't split compound nouns.
> Doing this requires some auxiliary data files with strange licenses. But
> [~uschindler] has documented and packaged everything up to make this easy:
> https://github.com/uschindler/german-decompounder
> We added a Lucene API example (using CustomAnalyzer) to the README:
> https://github.com/uschindler/german-decompounder/pull/6
> So I think it would be nice to link to this from the javadocs, it makes it
> really easy to download the datafiles and configure an appropriate analyzer,
> if you are OK with Latex and LGPL licenses for the data files (which many
> folks might be).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]