Please don't take my questions as criticism... I asked them "because" Lucene is held up as an example of good OO design, and I was trying to reconcile my knowledge of OO design with the Lucene developers.
It just "feels like" IndexReader and IndexWriter should be interfaces... My Lucene IndexReader has no knowledge of directories (and doesn't need it), yet a IDE looking at my class shows many available methods dealing with directories, locks, etc. when none are appropriate (or useful). It seems like IndexReader should be an interface, with AbstractIndexReader providing some helper methods (potentially), and DirectoryIndexReader providing the concrete path/directory related methods. Alternatively, a AbstractIndex which implemented both, and then DirectoryIndex as the concreate implementation - since the index reading and writing are so coupled (in format). Then you could have CompoundDirectoryIndex, etc. ... Even though my class is also an index writer, I could not implement it (since I was already implementing IndexReader), so I ended up with 'outside' methods to do the writing. I could have created a MyIndexWriter that extended IndexWriter, but there is really no point, since none of the methods in IndexWriter are useful. As for the why my 'Filter' interface would make the world a better place... there may be cases where the filter is rather sparse, but the document set is quite large. The requirement to return a BitSet is a lot of overhead for a large document set - when a an alternative direct filter (for example, look up a document in the db, check some value, exclude document), would be far more efficient. Actually if the filter domain is quite large, my interface might be MUCH better because you would not have to read the entire filter criteria, if only a few documents were being checked against it. The reason this is important for us, is that we filter returned documents through a security check, and the security check lends itself to dynamic checking. It might be more appropriate if the interface was boolean filter(Document document); rather than boolean filter(int docnum); but this would require reading each document (stored in cache?), before performing the filter. Don't get me wrong, Lucene is a GREAT product ! Robert Engels -----Original Message----- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 20, 2004 11:39 AM To: Lucene Developers List Subject: Re: incorrect OO in lucene source? Robert Engels wrote: > Lucene is often cited as an excellent example of OO design. That is kind, but the primary goal of Lucene is to provide functionality, not to use "correct" OO design. The two are not always in accord. > It seems the abstract base classes were created so there was a place to put > the static 'helper' methods. Would it not be better to have separate > interfaces and abstract base classes that the implementations could extend > if necessary? Functionally, the only thing an interface has that cannot be implemented by a class is multiple inheritance. Multiple inheritance is useful for simple behaviour mixins, like Comparable and Cloneable, but is rarely useful with more complex method sets. So, in cases where every implementation needs the same helper methods and multiple inheritance is not needed, interfaces add nothing functionally except more code to be maintained. That said, an interface does have some documentation value. > The main problem seems to be the use of IndexReader in many interfaces, when > it should be using a Searchable. Otherwise, what is the point of Searchable. Searchable was added to provide an API appropriate for remote invocation, to support RemoteSearchable. > The following object use chain shows the 'pointlessness' of Searchable: > > Searchable->Query->Weight->IndexReader IndexReader is required by methods which implement search algorithms, run in a potentially remote JVM. Only the Query and TopDocs are transmitted over the wire, so only these are handled by Searchable methods. Other, higher-bandwidth data is handled further downstream in the call chain, and thus is processed locally on the remote machine. > The Query object has the method: > > public Query rewrite(IndexReader reader) throws IOException... > > but the entire search package is based on 'Searchable's, should it not be > > public Query rewrite(Searchable searchable) throws IOException... > > If not, the entire idea of 'external' searchable implementations break, > because the end up having to 'know' an IndexReader, and IndexReader is a > baseclass, etc. IndexReader has a much richer API than Searchable. It is required for rewrite, which can, e.g., iterate over all terms in the index. Such iteration is not appropriate over RPCs. > The same goes for > > public Weight weight(Searcher searcher); > > should this not be a 'Searchable'? I think this could be changed without harming anything. Weights are only ever constructed locally, so it wouldn't really make a difference. > The Weight interface has the following: > > Scorer scorer(IndexReader reader) throws IOException; > > Should it not be > > Scorer scorer(Searchable searchable) throws IOException; > > The same goes for explain(). Again, IndexReader's API is required to implement these methods. Searchable has only a very small API, that which can be efficiently invoked remotely. > It seems that Searchable was created because IndexReader was not an > interface, but the methods have not been changed to use a Searchable instead > of IndexReader. No, Searchable and Searcher were created to provide the functionality of MultiSearcher and RemoteSearchable. There are search operations (like HitCollector-based searching) which are defined by Searcher, not Searchable, that may not be run remotely, but may be invoked locally over multiple indexes. > It seems many of the 'search' package classes use a 'Searcher', when they > should be using a 'Searchable'. If these core classes really need the > methods defined in 'Searcher', then these methods should be moved to > 'Searchable'. The Searchable interface should be restricted to methods which are useful in implementing effective, remotely invokable search. > Shouldn't 'Filter' just be an interface, with the method > > boolean filter(int docnum); > > as to whether or not the document should be filtered? How would this make the world a better place? Doug --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]