SpanQuery and Bits

Carsten Schnober Thu, 06 Dec 2012 01:55:27 -0800

Hi,
I have a problem understanding and applying the BitSets concept in
Lucene 4.0. Unfortunately, there does not seem to be a lot of
documentation about the topic.


The general task is to extract Spans matching a SpanQuery which works
with the following snippet:

for (AtomicReaderContext atomic : reader.getContext().leaves()) {               
  Spans spans = query.getSpans(atomic, new Bits.MatchAllBits(0),
termContexts);
  while (spans.next()) {
    // extract payloads etc.
  }
}

I understand that the acceptDocs parameter in SpanQuery.getSpans()
restricts the search to a set of documents. In the example given above,
it searches all documents (Bits.MatchAllBits), right?

What I would like to do is generate a Bits object that is based on a
BooleanQuery beforehand in order to restrict the search through
getSpans() to a set of documents that contain certain terms.
I also have a MultiReader object that handles multiple indexes.
My intuitive approach would be to apply a QueryWrapperFilter like this:

MultiReader reader = ...
BooleanQuery bq = ...
DocIdSet bitset = ???;
Filter filter = new QueryWrapperFilter(bq);
for (AtomicReaderContext context = reader.getContext().leaves()) {
  filter.getDocIdSet(context, new Bits.MatchAllBits(0))
}

The obvious question is: how do I handle the context bitsets returned by
getDocIdSet() correctly so that I can pass the 'bitset' variable to the
getSpans() call?

Or am I on the wrong path for this kind of problem?
Thanks!
Carsten


-- 
Institut für Deutsche Sprache | http://www.ids-mannheim.de
Projekt KorAP                 | http://korap.ids-mannheim.de
Tel. +49-(0)621-43740789      | [email protected]
Korpusanalyseplattform der nächsten Generation
Next Generation Corpus Analysis Platform

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

SpanQuery and Bits

Reply via email to