Also... you're HashSet constructor has to copy values from the original HashSet into the new HashSet ... not very clean and this can just be removed by forcing the caller to use a HashSet (which they should).
I've caved in and gone HashSet all the way.
Did you not see my message suggesting a way to both not expose HashSet publicly and also not to copy values? If not, I attached it.
Doug
--- Begin Message --- [EMAIL PROTECTED] wrote:- public StopFilter(TokenStream in, Set stopTable) { + public StopFilter(TokenStream in, Set stopWords) { super(in); - table = stopTable; + this.stopWords = new HashSet(stopWords); }
This always allocates a new HashSet, which, if the stop list is large, and documents are small, could impact performance.
Perhaps we can replace this with something like:
public StopFilter(TokenStream in, Set stopWords) { this(in, stopWords instanceof HashSet ? ((HashSet)stopWords) : new HashSet(stopWords)); }
and then add another constructor:
private StopFilter(TokenStream in, HashSet stopWords) { super(in); this.stopWords = stopTable; }
Also, if we want the implementation to always be a HashSet internally, for performance, we ought to declare the field to be a HashSet, no?
The competing goals here are:
1. Not to expose publicly the implementation of the Set;
2. Not to copy the contents of the Set when folks pass the value of makeStopSet.
3. Use the most efficient implementation internally.
I think the changes above meet all of these.
Doug
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--- End Message ---
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]