[
https://issues.apache.org/jira/browse/LUCENE-10098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17413758#comment-17413758
]
ASF subversion and git services commented on LUCENE-10098:
----------------------------------------------------------
Commit 87c5f591b86e0672dd9b6abb4ea42598851f3b99 in lucene-solr's branch
refs/heads/branch_8x from Robert Muir
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=87c5f59 ]
LUCENE-10098: sync 8.11 CHANGES.txt with main
> Add note/link to GermanAnalyzer for decompounding nouns
> -------------------------------------------------------
>
> Key: LUCENE-10098
> URL: https://issues.apache.org/jira/browse/LUCENE-10098
> Project: Lucene - Core
> Issue Type: Task
> Reporter: Robert Muir
> Priority: Major
> Time Spent: 1h
> Remaining Estimate: 0h
>
> The GermanAnalyzer doesn't split compound nouns.
> Doing this requires some auxiliary data files with strange licenses. But
> [~uschindler] has documented and packaged everything up to make this easy:
> https://github.com/uschindler/german-decompounder
> We added a Lucene API example (using CustomAnalyzer) to the README:
> https://github.com/uschindler/german-decompounder/pull/6
> So I think it would be nice to link to this from the javadocs, it makes it
> really easy to download the datafiles and configure an appropriate analyzer,
> if you are OK with Latex and LGPL licenses for the data files (which many
> folks might be).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]