separate chararrayset interface from impl -----------------------------------------
Key: LUCENE-2227 URL: https://issues.apache.org/jira/browse/LUCENE-2227 Project: Lucene - Java Issue Type: Task Components: Analysis Affects Versions: 3.0 Reporter: Robert Muir Priority: Minor CharArraySet should be abstract the hashing implementation currently being used should instead be called CharArrayHashSet currently our 'CharArrayHashSet' is hardcoded across Lucene, but others might want their own impl. For example, implementing CharArraySet as DFA with org.apache.lucene.util.automaton gives faster contains(char[], int, int) performance, as it can do a 'fast fail' and need not hash the entire string. This is useful as it speeds up indexing in StopFilter. I did not think this would be faster but i did benchmarks over and over with the reuters corpus, and it is, even with english text's wierd average word length of 5 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org