[
https://issues.apache.org/jira/browse/LUCENE-7304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15317229#comment-15317229
]
Martijn van Groningen commented on LUCENE-7304:
-----------------------------------------------
[[email protected]] This is a lot of code :) I really think this should
be moved to a new issue, not just because of this size of the patch, but also
because the implementation is different compared to what was initially proposed
here. Also I think that EliasFanoDocIdSet and friends shouldn't be added to
core, but should be added the join module instead. EliasFano was superseded
from core as general purposes docidset by other implementations a while ago and
since now it will be used in context of block join, it makes sense to just add
it to the join module.
> Doc values based block join implementation
> ------------------------------------------
>
> Key: LUCENE-7304
> URL: https://issues.apache.org/jira/browse/LUCENE-7304
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Martijn van Groningen
> Priority: Minor
> Attachments: LUCENE-5092-20140313.patch, LUCENE-7304-20160531.patch,
> LUCENE-7304-20160606.patch, LUCENE_7304.patch
>
>
> At query time the block join relies on a bitset for finding the previous
> parent doc during advancing the doc id iterator. On large indices these
> bitsets can consume large amounts of jvm heap space. Also typically due the
> nature how these bitsets are set, the 'FixedBitSet' implementation is used.
> The idea I had was to replace the bitset usage by a numeric doc values field
> that stores offsets. Each child doc stores how many docids it is from its
> parent doc and each parent stores how many docids it is apart from its first
> child. At query time this information can be used to perform the block join.
> I think another benefit of this approach is that external tools can now
> easily determine if a doc is part of a block of documents and perhaps this
> also helps index time sorting?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]