Thanks.
On 1/6/06, Paul Elschot <[EMAIL PROTECTED]> wrote: > > On Friday 06 January 2006 18:04, Beady Geraghty wrote: > > I would like to do queries that are negative. I mean a query with > > only negative terms and phrases. For example, retrieve all > > documents that do not contain the term "apple". > > > > For now, I have a limited set of documents (say, 10000) to index. > > I can create a bitset that represents the search result of hits on > "apple". > > Then I complement (XOR) the result. > > Each bit corresponds to a document ID. > > My question is : > > Inside Lucene, are the hits represented in some form of a bitset. > > Can I get at it directly. I saw the BitSet class. (I now use > > Java's Bitset class). > > Assuming that hits are internally represented as bitset, for a > > small number of documets, the bitset won't be very big, > > and if there are plenty of hits and many many more documents, > > is the bitset still kept entirely > > in memory as well ? > > A Hits is implemented by caching some of the highest scoring > documents, when more documents are needed the search is > repeated to collect more documents. > > The problem with negative queries is that the scores of the results > do not vary, so it is not useful to keep only the highest scoring docs. > This also means that all results will have to be processed further > in some other way. > The easiest way to do that is to use the MatchAllDocsQuery > as indicated earlier, and then use the low level search API > with your own HitCollector. > You can then use any data structure in your HitCollector. > A simple and fast collect() implementation just counts the results, and > that can already be quite informative. Setting up a BitSet > for the matching document numbers is also possible. > It's best to avoid accessing the index via the IndexReader inside > the collect() implementation of the HitCollector. > > Regards, > Paul Elschot > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >