HashSet is fine, and the best we've currently got, but let the caller of StopFilter choose that. Other, better, Set implementations may be widely available one day.
Otis --- Erik Hatcher <[EMAIL PROTECTED]> wrote: > This, too, is my preferred approach.... it just seemed such a strong > argument for HashSet everywhere. I particularly agree with the lack > > of need to enforce HashSet. > > Erik > > On Mar 11, 2004, at 8:29 AM, Otis Gospodnetic wrote: > > > I agree, partially, with Doug about not copying thing and that use > of > > instanceof. > > > > The part where I don't agree is where I agree with what Scott Ganyo > > said, and with Erik's initial approach: use interfaces. I don't > see a > > need to epxose that HashSet. Just use Set. > > > > Well, maybe not even an internal HashSet enforcement needs to be > made. > > Why not leave it up to the caller to pick the Set implementation > that > > it wants to use? Why enforce it in StopFilter? > > > > I'm for: > > > > public StopFilter(TokenStream in, Set stopWords) { > > super(in); > > this.stopWords = stopWords; > > > > > > Otis (didn't follow the discussion closely, sorry if I repeated > > somebody else's words or if I'm way off) Gospodnetic > > > > > > > > --- Doug Cutting <[EMAIL PROTECTED]> wrote: > >> [EMAIL PROTECTED] wrote: > >>> - public StopFilter(TokenStream in, Set stopTable) { > >>> + public StopFilter(TokenStream in, Set stopWords) { > >>> super(in); > >>> - table = stopTable; > >>> + this.stopWords = new HashSet(stopWords); > >>> } > >> > >> This always allocates a new HashSet, which, if the stop list is > >> large, > >> and documents are small, could impact performance. > >> > >> Perhaps we can replace this with something like: > >> > >> public StopFilter(TokenStream in, Set stopWords) { > >> this(in, stopWords instanceof HashSet ? ((HashSet)stopWords) > >> : new HashSet(stopWords)); > >> } > >> > >> and then add another constructor: > >> > >> private StopFilter(TokenStream in, HashSet stopWords) { > >> super(in); > >> this.stopWords = stopTable; > >> } > >> > >> Also, if we want the implementation to always be a HashSet > >> internally, > >> for performance, we ought to declare the field to be a HashSet, > no? > >> > >> The competing goals here are: > >> 1. Not to expose publicly the implementation of the Set; > >> 2. Not to copy the contents of the Set when folks pass the > value > >> of > >> makeStopSet. > >> 3. Use the most efficient implementation internally. > >> > >> I think the changes above meet all of these. > >> > >> Doug > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]