[ https://issues.apache.org/jira/browse/LUCENE-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848054#action_12848054 ]
Dawid Weiss commented on LUCENE-2298: ------------------------------------- Staszek suggested that perhaps it would be convenient if this patch detected if another Polish stemming library for Polish is present in classpath and if so, used it. The library in mind is "morfologik-stemming", here: http://sourceforge.net/projects/morfologik/ The code of this library is BSD-licensed and consists mainly of traversal of FSA automata. The stemmer is dictionary based, so it is nearly (ambiguities) 100% accurate for words in the dictionary and 0% accurate for non-dictionary words (returns null). The problem with Morfologik is that its dictionary data is LGPL-ed, so it would have to be a separate download. This is just a suggestion for discussion. I guess this functionality is limited to a very narrow audience anyway. > Polish Analyzer > --------------- > > Key: LUCENE-2298 > URL: https://issues.apache.org/jira/browse/LUCENE-2298 > Project: Lucene - Java > Issue Type: New Feature > Components: contrib/analyzers > Affects Versions: 3.1 > Reporter: Robert Muir > Assignee: Robert Muir > Fix For: 3.1 > > Attachments: LUCENE-2298.patch, stemmer_20000.7z > > > Andrzej Bialecki has written a Polish stemmer and provided stemming tables > for it under Apache License. > You can read more about it here: http://www.getopt.org/stempel/ > In reality, the stemmer is general code and we could use it for more > languages too perhaps. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org