[ 
https://issues.apache.org/jira/browse/LUCENE-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13438471#comment-13438471
 ] 

Chris Male commented on LUCENE-4311:
------------------------------------

I was able to replicate this behaviour and to get an idea of what's causing it. 
 The cause seems to be the recursive rule evaluation.  'prahnout' is being 
identified as a stem through the application of a number of rules recursively 
when the cross-product flag is enabled.  

Originally the recursive depth was unlimited but this lead to infinite loops in 
some languages.  Consequently we limited it to 2.  However I'm not sure we're 
doing it right.  Although some of the papers don't specify a limit on the 
recursion [this|http://www.ldc.upenn.edu/Catalog/docs/LDC2008T01/acta04.pdf] 
seems to suggest it should only be two fold, meaning the limit should be 1.  
Having made this change, I no longer get 'prahnout' as a suggestion.

I'm going to think about how best to patch this.  Options are to either change 
the value directly or to leave it as is but provide configuration control over 
the maximum recursive depth desired.
                
> HunspellStemFilter returns another values than Hunspell in console / command 
> line with same dictionaries.
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-4311
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4311
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/other
>    Affects Versions: 3.5, 4.0-ALPHA, 3.6.1
>         Environment: Apache Solr 3.5 - 4.0, Apache Tomcat 7.0
>            Reporter: Jan Rieger
>         Attachments: cs_CZ.aff, cs_CZ.dic
>
>
> When I used HunspellStemFilter for stemming the czech language text, it 
> returns me bad results.
> For example word "praha" returns "praha" and "prahnout", what is not correct.
> So I try the same in my console (Hunspell command line) with exactly same 
> dictionaries and it returns only "praha" and this is correct.
> Can somebody help me?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to