Hi,
I have a problem understanding and applying the BitSets concept in
Lucene 4.0. Unfortunately, there does not seem to be a lot of
documentation about the topic.
The general task is to extract Spans matching a SpanQuery which works
with the following snippet:
for (AtomicReaderContext atomic : reader.getContext().leaves()) {
Spans spans = query.getSpans(atomic, new Bits.MatchAllBits(0),
termContexts);
while (spans.next()) {
// extract payloads etc.
}
}
I understand that the acceptDocs parameter in SpanQuery.getSpans()
restricts the search to a set of documents. In the example given above,
it searches all documents (Bits.MatchAllBits), right?
What I would like to do is generate a Bits object that is based on a
BooleanQuery beforehand in order to restrict the search through
getSpans() to a set of documents that contain certain terms.
I also have a MultiReader object that handles multiple indexes.
My intuitive approach would be to apply a QueryWrapperFilter like this:
MultiReader reader = ...
BooleanQuery bq = ...
DocIdSet bitset = ???;
Filter filter = new QueryWrapperFilter(bq);
for (AtomicReaderContext context = reader.getContext().leaves()) {
filter.getDocIdSet(context, new Bits.MatchAllBits(0))
}
The obvious question is: how do I handle the context bitsets returned by
getDocIdSet() correctly so that I can pass the 'bitset' variable to the
getSpans() call?
Or am I on the wrong path for this kind of problem?
Thanks!
Carsten
--
Institut für Deutsche Sprache | http://www.ids-mannheim.de
Projekt KorAP | http://korap.ids-mannheim.de
Tel. +49-(0)621-43740789 | [email protected]
Korpusanalyseplattform der nächsten Generation
Next Generation Corpus Analysis Platform
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]