I think the interface I proposed is simpler and handles more cases easily.

interface SearchFilter {
    boolean include(int doc);
}

It seems your interface requires that the SearchFilter know all of the query
results before hand. I am not sure this works well with the partial result
sets that Lucene supports.

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Chris Hostetter
Sent: Thursday, January 26, 2006 1:09 PM
To: java-dev@lucene.apache.org
Subject: Re: Filter



The subject of revamping the Filter API to support more compact filter
representations has come up in the past ... At least one patch comes to
mind that helps with the issue...

   https://issues.apache.org/jira/browse/LUCENE-328

...i'm not intimitely familiar with that code, but if i recall correctly
from the last time i read it, it doesn't propose any actual API changes
just some utilities to reduce memory usage.

Reading your post has me thinking about this whole issue again,
particularly the subject of Filters that are straight forward enough they
could be implimented as simple iterators with very little state and what
API changes could be made to support the interface you describe and still
be backwards compatible.

One thing that comes to mind (that i don't remember suggesting before, but
perhaps someone else has suggested it before) is that since Filter is an
bastract class which people arecurrently required to subclass, we could
follow a migration path something like this...

  1) add a SearchFilter interface like the one you describe to the core
     code base
  2) add the following method declaration to the Filter class...
        public SearchFilter getSearchFilter(IndexReader) throws IOException
     ...impliment this method by calling bits, and returning an instance
     of a thin inner class that wraps the BitSet
  3) indicate that Filter.bits() is deprecated.
  4) change all existing calls to Filter.bits() in the core lucene code
     base to call Filter.getSearchFilter and do whatever iterating is
     neccessary.
  5) gradually reimpliment all of the concrete instances of Filter in
     the core lucene code base so they override the getSearchFilter method
     with something that returns a more "iterator" style SearchFilter,
     and impliment their bits() method to use the SearchFilter to build up
     the bit set if clients call it directly.
  6) wait a suitable amount of time.
  7) remove Filter.bits() and all of the concrete implimentations from the
     lucene core.

...i think that would be a fairly straight forward and practical way to
execute such a change.  The big question in my mind is what the
"SearchFilter" interface should look like.  what you propose is along the
usage lines of "iterate over your ScoreDocs, and foreach one test it
against hte filter" ... but i'm not convinced that it wouldnt' make more
sense to say "ask the filter what the next viable doc is, now score it",
ala...

      public interface SearchFilter {
          /** returns doc ids that pass the filter, in increasing order.
           * returns 0 once there are no more docs.
           */
          int doc getNextFilteredDoc();
      }


thoughts?


: Date: Thu, 26 Jan 2006 14:35:44 +0100
: From: Morus Walter <[EMAIL PROTECTED]>
: Reply-To: java-dev@lucene.apache.org
: To: java-dev@lucene.apache.org
: Subject: Filter
:
: Hi,
:
: I would like to suggest a more general filter interface which could be
: added as an alternative to the current bitset filters.
: (Replacing the bitset filters would only be possible if api changes were
: acceptable).
:
: While bitset based filters are useful in many use cases the restriction
: of filters to using bitsets prevents other solutions.
: Especially since the introduction of field caches for sorting it's easy
: to implement filters directly based on field values.
:
: So I suggest to add a general filter interface that requires a filter
: just to provide a filter-method that takes a ScoreDoc and returns
: true or false if the document passes the filter or is rejected.
: This would be basically
: public interface SearchFilter {
:     boolean filter(ScoreDoc doc);
: }
:
: Thus a filter could be implemented using a bitset or it could get a
: field cache and check the documents value based on that or in any
: other way.
: Providing a ScoreDoc to the filter (instead of the document id alone)
: allows to write filters that modify the score instead of
: accepting/rejecting documents.
:
: Use cases include
: - Filtering based on document values
:   E.g. a date filter. This can be done by the current bitset based
: filters but if the date ranges vary from query to query and the index
: change rate is low, using a field cache on the dates seems better than
: creating a bitset for each range.
: - Modifying the score
:   E.g. a scoring that degrades the score based on a date field to prefer
: new documents over old ones. This is not the same as sorting by date
: since an old but good hit can still end in a better score than a new but
: low scored hit.
: - Collecting addional information
:   Lets say you have a category field in your documents. Using a field
: cache you could count the number of hits for each category.
:
: Of course this can be done (and I did some of this) by subclassing
: and extending IndexSearcher, but I think the support for generalised
: filters should rather be part of the lucene core itself.
: Adding such an api would mean to duplicate all the search methods taking
: filters to have an additional version taking the generalized filter. Not
: really nice, but I think it would be worth the effort. And if api
: changes are accepted (e.g. for 2.0) the bitset filters could be replaced
: by the generalized filter since a bitset filter could be easily wraped
: in a generalized filter (at the cost of an additional method call per
: lookup).
:
: If there is interest in such a change and it would be accepted, I could
: work out a patch (might take some time though).
:
: Morus
:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: [EMAIL PROTECTED]
: For additional commands, e-mail: [EMAIL PROTECTED]
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to