Hi,

I would like to suggest a more general filter interface which could be
added as an alternative to the current bitset filters.
(Replacing the bitset filters would only be possible if api changes were
acceptable).

While bitset based filters are useful in many use cases the restriction
of filters to using bitsets prevents other solutions.
Especially since the introduction of field caches for sorting it's easy
to implement filters directly based on field values.

So I suggest to add a general filter interface that requires a filter
just to provide a filter-method that takes a ScoreDoc and returns
true or false if the document passes the filter or is rejected.
This would be basically
public interface SearchFilter {
    boolean filter(ScoreDoc doc);
}

Thus a filter could be implemented using a bitset or it could get a
field cache and check the documents value based on that or in any
other way.
Providing a ScoreDoc to the filter (instead of the document id alone)
allows to write filters that modify the score instead of
accepting/rejecting documents.

Use cases include
- Filtering based on document values
  E.g. a date filter. This can be done by the current bitset based
filters but if the date ranges vary from query to query and the index
change rate is low, using a field cache on the dates seems better than
creating a bitset for each range.
- Modifying the score
  E.g. a scoring that degrades the score based on a date field to prefer
new documents over old ones. This is not the same as sorting by date
since an old but good hit can still end in a better score than a new but
low scored hit.
- Collecting addional information
  Lets say you have a category field in your documents. Using a field
cache you could count the number of hits for each category.

Of course this can be done (and I did some of this) by subclassing
and extending IndexSearcher, but I think the support for generalised
filters should rather be part of the lucene core itself.
Adding such an api would mean to duplicate all the search methods taking
filters to have an additional version taking the generalized filter. Not
really nice, but I think it would be worth the effort. And if api
changes are accepted (e.g. for 2.0) the bitset filters could be replaced
by the generalized filter since a bitset filter could be easily wraped
in a generalized filter (at the cost of an additional method call per
lookup).

If there is interest in such a change and it would be accepted, I could
work out a patch (might take some time though).

Morus

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to