[ 
https://issues.apache.org/jira/browse/LUCENE-7304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15306872#comment-15306872
 ] 

Paul Elschot commented on LUCENE-7304:
--------------------------------------

To use an EliasFano dictionary in an index, it would be better to start from 
the EliasFano code from LUCENE-5627 because that one also has an implementation 
on a BytesRef that is used as a payload there. From the BytesRef it would 
probably be easier to put it directly in an index.
The same advanceToJustBefore() method (from DocBlockIterator) would still need 
to be added.

The above patch for LUCENE-5092 also moves block joins from FixedBitSet to 
DocBlockIterator.
For use here, that would allow two different implementations of 
DocBlockIterator, the current FixedBitSet and an implementation based on doc 
values.

> Doc values based block join implementation
> ------------------------------------------
>
>                 Key: LUCENE-7304
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7304
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Martijn van Groningen
>            Priority: Minor
>         Attachments: LUCENE-5092-20140313.patch, LUCENE_7304.patch
>
>
> At query time the block join relies on a bitset for finding the previous 
> parent doc during advancing the doc id iterator. On large indices these 
> bitsets can consume large amounts of jvm heap space.  Also typically due the 
> nature how these bitsets are set, the 'FixedBitSet' implementation is used.
> The idea I had was to replace the bitset usage by a numeric doc values field 
> that stores offsets. Each child doc stores how many docids it is from its 
> parent doc and each parent stores how many docids it is apart from its first 
> child. At query time this information can be used to perform the block join.
> I think another benefit of this approach is that external tools can now 
> easily determine if a doc is part of a block of documents and perhaps this 
> also helps index time sorting?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to