separate chararrayset interface from impl
-----------------------------------------

                 Key: LUCENE-2227
                 URL: https://issues.apache.org/jira/browse/LUCENE-2227
             Project: Lucene - Java
          Issue Type: Task
          Components: Analysis
    Affects Versions: 3.0
            Reporter: Robert Muir
            Priority: Minor


CharArraySet should be abstract
the hashing implementation currently being used should instead be called 
CharArrayHashSet

currently our 'CharArrayHashSet' is hardcoded across Lucene, but others might 
want their own impl.
For example, implementing CharArraySet as DFA with 
org.apache.lucene.util.automaton gives faster contains(char[], int, int) 
performance, as it can do a 'fast fail' and need not hash the entire string.

This is useful as it speeds up indexing in StopFilter.

I did not think this would be faster but i did benchmarks over and over with 
the reuters corpus, and it is, even with english text's wierd average word 
length of 5


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to