I agree, partially, with Doug about not copying thing and that use of instanceof.
The part where I don't agree is where I agree with what Scott Ganyo said, and with Erik's initial approach: use interfaces. I don't see a need to epxose that HashSet. Just use Set. Well, maybe not even an internal HashSet enforcement needs to be made. Why not leave it up to the caller to pick the Set implementation that it wants to use? Why enforce it in StopFilter? I'm for: public StopFilter(TokenStream in, Set stopWords) { super(in); this.stopWords = stopWords; Otis (didn't follow the discussion closely, sorry if I repeated somebody else's words or if I'm way off) Gospodnetic --- Doug Cutting <[EMAIL PROTECTED]> wrote: > [EMAIL PROTECTED] wrote: > > - public StopFilter(TokenStream in, Set stopTable) { > > + public StopFilter(TokenStream in, Set stopWords) { > > super(in); > > - table = stopTable; > > + this.stopWords = new HashSet(stopWords); > > } > > This always allocates a new HashSet, which, if the stop list is > large, > and documents are small, could impact performance. > > Perhaps we can replace this with something like: > > public StopFilter(TokenStream in, Set stopWords) { > this(in, stopWords instanceof HashSet ? ((HashSet)stopWords) > : new HashSet(stopWords)); > } > > and then add another constructor: > > private StopFilter(TokenStream in, HashSet stopWords) { > super(in); > this.stopWords = stopTable; > } > > Also, if we want the implementation to always be a HashSet > internally, > for performance, we ought to declare the field to be a HashSet, no? > > The competing goals here are: > 1. Not to expose publicly the implementation of the Set; > 2. Not to copy the contents of the Set when folks pass the value > of > makeStopSet. > 3. Use the most efficient implementation internally. > > I think the changes above meet all of these. > > Doug --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]