Lukas Vlcek created LUCENE-5484:
-----------------------------------

             Summary: Distinct control of recursion levels for prefix and 
suffix in Hunspell.
                 Key: LUCENE-5484
                 URL: https://issues.apache.org/jira/browse/LUCENE-5484
             Project: Lucene - Core
          Issue Type: Improvement
          Components: modules/analysis
            Reporter: Lukas Vlcek
            Priority: Minor


Currently, there is an option to set recursionCap value to control depth of 
recursion in Hunspell token filter. This recursion enables to apply allowed 
affix rule to input token and pass output token(s) as an input tokens 
recursively.

However, the recursionCap does not allow to distinguish between how many prefix 
and suffix rules were applied. It just counts for total. For example if 
recursionCap is set to 1 it actually includes all of the following options:
- 2 prefix rules, 0 suffix rules
- 1prefix rule, 1 suffix rule
- 0 prefix rules, 2 suffix rules

In some cases it is required to be able to distinguish between prefix rule and 
suffix rule and have finer control over how many times is each applied. 
Requested feature should allow setting recursion level separately for prefix 
and suffix rules.

Specific example is the Czech dictionary, where it gives best results if suffix 
rules are applied only once. Hence recursionCap = 0. But if for input token a 
prefix rule is applied it does not allow to apply suffix rule and produces a 
token that is not in root form. And setting recursionCap = 1 produces too many 
irrelevant tokens that it makes Hunspell token filter unuseful. Good solution 
to this problem would be tell Hunspell token filter to apply up to 1 prefix 
rule and up to 1 suffix rule only (meaning never allow to apply 0 prefix rules 
and 2 suffix rules).

Generally, this is probably dependant a lot on how particular dictionary and 
affix rules are constructed so it might not be considered a generalization but 
rather and expert feature.

(There was some relevant discussion going on in LUCENE-5468)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to