Hi,
> > AcceptDocs in Lucene are generally all non-deleted documents. For your
> call to Filter.getDocIdSet you should therefor pass
> AtomicReader.getLiveDocs() and not Bits.MatchAllBits.
>
> I see. As far as I understand the documentation, getLiveDocs() returns null if
> there are no deleted documents and returns the Bits matching all available
> (not deleted) documents otherwise:
> "Returns the Bits representing live (not deleted) docs. A set bit indicates
> the
> doc ID has not been deleted. If this method returns null it means there are
> no deleted documents."
> I understand that if there are no deleted documents, I need to replace the
> result (null) with Bits.MatchAllDocuments(), right? If there are deleted
> documents however, I can pass on the result having all available (not
> deleted) document bits set.
No, if acceptDocs==null the filter/query/... assumes that there are no deleted
documents. Just pass null.
> > You are somehow "misusing" acceptDocs and DocIdSet here, so you have
> to take care, semantics are different:
> > - For acceptDocs "null" means "all documents allowed" -> no deleted
> > documents
> > - For DocIdSet "null" means "no documents matched"
>
> Okay, as described above, I would now pass either the result of
> getLiveDocs() or Bits.MatchAllDocuments() as the acceptDocs argument to
> getDocIdSet():
>
> Map<Term, TermContext> termContexts = new HashMap<>();
> AtomicReaderContext atomic = ...
> ChainedFilter filter = ...
You just pass getLiveDocs(), no null check needed. Using your code would bring
a slowdown for indexes without deletions.
> Bits allDocs = atomic.reader().getLiveDocs(); if (allDocs == null) {
> // no deleted documents
> allDocs = new Bits.MatchAllBits(atomic.reader().maxDoc());
> }
> Bits bits = filter.getDocIdSet(atomic, allDocs).bits(); if (bits == null) {
> // no documents matching filter
> continue; // skip this iteration
> }
> Spans spans = sq.getSpans(atomic, bits, termContexts);
>
>
> > Finally: The trick here is to make Spans think that there are more deleted
> docs than AtomicReader returns as deleted docs (if you would directly pass
> getLiveDocs() to getSpans()). The filter is applied to the deleted docs
> BitSet.
>
> Yep, I think I've tried to simulate that now. It is pretty hard to test this
> systematically, so please let me know if you see an obvious flaw in my code.
> Thanks!
> Best,
> Carsten
>
> --
> Institut für Deutsche Sprache | http://www.ids-mannheim.de
> Projekt KorAP | http://korap.ids-mannheim.de
> Tel. +49-(0)621-43740789 | [email protected]
> Korpusanalyseplattform der nächsten Generation Next Generation Corpus
> Analysis Platform
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]