Robert Muir created LUCENE-5518:
-----------------------------------

             Summary: minor hunspell optimizations
                 Key: LUCENE-5518
                 URL: https://issues.apache.org/jira/browse/LUCENE-5518
             Project: Lucene - Core
          Issue Type: Improvement
          Components: modules/analysis
            Reporter: Robert Muir


After benchmarking indexing speed on SOLR-3245, I ran a profiler and a couple 
things stood out.

There are other things I want to improve too, but these almost double the speed 
for many dictionaries.

* Hunspell supports two-stage affix stripping, but the vast majority of 
dictionaries don't have any affixes that support it. So we just add a boolean 
(Dictionary.twoStageAffix) that is false until we see one.
* We use java.util.regex.Pattern for condition checks. This is slow, I switched 
to o.a.l.automaton and its much faster, and uses slightly less RAM too.




--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to