I don't buy it. HashSet is but one implementation of a Set. By choosing the HashSet implementation you are not only tying the class to a hash-based implementation, you are trying the interface to *that specific* hash-based implementation or it's subclasses. In the end, either you buy the concept of the interface and its abstraction or you don't. I firmly believe in using interfaces as they were intended to be used.

Scott

P.S. In fact, HashSet isn't always going to be the most efficient anyway. Just for one example: Consider possible implementations if I have only 1 or 2 entries.

On Mar 10, 2004, at 11:13 PM, Erik Hatcher wrote:

On Mar 10, 2004, at 10:28 PM, Doug Cutting wrote:
Erik Hatcher wrote:
Also... you're HashSet constructor has to copy values from the original HashSet into the new HashSet ... not very clean and this can just be removed by forcing the caller to use a HashSet (which they should).
I've caved in and gone HashSet all the way.

Did you not see my message suggesting a way to both not expose HashSet publicly and also not to copy values? If not, I attached it.

Yes, I saw it. But is there a reason not to just expose HashSet given that it is the data structure that is most efficient? I bought into Kevin's arguments that it made sense to just expose HashSet.

As for copying values - that is only happening now if you use the Hashtable or String[] constructor.

Erik



Doug



From: Doug Cutting <[EMAIL PROTECTED]>
Date: March 10, 2004 1:08:24 PM EST
To: Lucene Developers List <[EMAIL PROTECTED]>
Subject: Re: cvs commit: jakarta-lucene/src/java/org/apache/lucene/analysis StopFilter.java
Reply-To: "Lucene Developers List" <[EMAIL PROTECTED]>


[EMAIL PROTECTED] wrote:
  -  public StopFilter(TokenStream in, Set stopTable) {
  +  public StopFilter(TokenStream in, Set stopWords) {
       super(in);
  -    table = stopTable;
  +    this.stopWords = new HashSet(stopWords);
     }

This always allocates a new HashSet, which, if the stop list is large, and documents are small, could impact performance.

Perhaps we can replace this with something like:

public StopFilter(TokenStream in, Set stopWords) {
  this(in, stopWords instanceof HashSet ? ((HashSet)stopWords)
           : new HashSet(stopWords));
}

and then add another constructor:

private StopFilter(TokenStream in, HashSet stopWords) {
  super(in);
  this.stopWords = stopTable;
}

Also, if we want the implementation to always be a HashSet internally, for performance, we ought to declare the field to be a HashSet, no?

The competing goals here are:
1. Not to expose publicly the implementation of the Set;
2. Not to copy the contents of the Set when folks pass the value of makeStopSet.
3. Use the most efficient implementation internally.

I think the changes above meet all of these.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Attachment: smime.p7s
Description: S/MIME cryptographic signature



Reply via email to