HashSet is fine, and the best we've currently got, but let the caller
of StopFilter choose that.  Other, better, Set implementations may be
widely available one day.

Otis

--- Erik Hatcher <[EMAIL PROTECTED]> wrote:
> This, too, is my preferred approach.... it just seemed such a strong 
> argument for HashSet everywhere.   I particularly agree with the lack
> 
> of need to enforce HashSet.
> 
>       Erik
> 
> On Mar 11, 2004, at 8:29 AM, Otis Gospodnetic wrote:
> 
> > I agree, partially, with Doug about not copying thing and that use
> of
> > instanceof.
> >
> > The part where I don't agree is where I agree with what Scott Ganyo
> > said, and with Erik's initial approach: use interfaces.  I don't
> see a
> > need to epxose that HashSet.  Just use Set.
> >
> > Well, maybe not even an internal HashSet enforcement needs to be
> made.
> > Why not leave it up to the caller to pick the Set implementation
> that
> > it wants to use?  Why enforce it in StopFilter?
> >
> > I'm for:
> >
> > public StopFilter(TokenStream in, Set stopWords) {
> >     super(in);
> >     this.stopWords = stopWords;
> >
> >
> > Otis (didn't follow the discussion closely, sorry if I repeated
> > somebody else's words or if I'm way off) Gospodnetic
> >
> >
> >
> > --- Doug Cutting <[EMAIL PROTECTED]> wrote:
> >> [EMAIL PROTECTED] wrote:
> >>>   -  public StopFilter(TokenStream in, Set stopTable) {
> >>>   +  public StopFilter(TokenStream in, Set stopWords) {
> >>>        super(in);
> >>>   -    table = stopTable;
> >>>   +    this.stopWords = new HashSet(stopWords);
> >>>      }
> >>
> >> This always allocates a new HashSet, which, if the stop list is
> >> large,
> >> and documents are small, could impact performance.
> >>
> >> Perhaps we can replace this with something like:
> >>
> >> public StopFilter(TokenStream in, Set stopWords) {
> >>    this(in, stopWords instanceof HashSet ? ((HashSet)stopWords)
> >>             : new HashSet(stopWords));
> >> }
> >>
> >> and then add another constructor:
> >>
> >> private StopFilter(TokenStream in, HashSet stopWords) {
> >>    super(in);
> >>    this.stopWords = stopTable;
> >> }
> >>
> >> Also, if we want the implementation to always be a HashSet
> >> internally,
> >> for performance, we ought to declare the field to be a HashSet,
> no?
> >>
> >> The competing goals here are:
> >>    1. Not to expose publicly the implementation of the Set;
> >>    2. Not to copy the contents of the Set when folks pass the
> value
> >> of
> >> makeStopSet.
> >>    3. Use the most efficient implementation internally.
> >>
> >> I think the changes above meet all of these.
> >>
> >> Doug
> >
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to