[ 
https://issues.apache.org/jira/browse/LUCENE-5084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13696404#comment-13696404
 ] 

Adrien Grand commented on LUCENE-5084:
--------------------------------------

I have not dug much through the code but I tested it against various 
randomly-generated sets with numDocs=10M, and the compression looks great:

||Load||FixedBitSet||WAH8DocIdSet(LUCENE-5081)||EliasFanoDocIdSet(this 
issue)||PForDeltaDocIdSet(from kamikaze, LUCENE-2750)||
|0.001% |1.2 MB |424 bytes      |344 bytes      |9 KB
|0.01%  |1.2 MB |3.4 KB |2 KB   |10.6 KB
|0.1%   |1.2 MB |28.4 KB        |14.7 KB        |25.1 KB
|1%     |1.2 MB |223.2 KB       |104.6 KB       |132.3 KB
|10%    |1.2 MB |1 MB   |641 KB |860.5 KB
|30%    |1.2 MB |1.2 MB |1.3 MB |1.9 MB
|50%    |1.2 MB |1.2 MB |1.8 MB |2.7 MB
|70%    |1.2 MB |1.2 MB |2 MB   |3 MB
|90%    |1.2 MB |1.2 MB |2.3 MB |3.1 MB

I especially like the fact that it saves almost half the memory even for pretty 
large sets that contain 1/10th of all doc IDs.

bq. I have used package o.a.l.util.eliasfano, this could be changed to 
o.a.l.util.packed for example.

Indeed maybe we don't need a dedicated package for this DocIdSet. 
oal.util.packed would be fine I think.

bq. There is a NOCOMMIT for a static longHex method that dumps a long in fixed 
width hex format, is there a better place for this method?

I think it is OK to leave it here.

I'll try to dig more thoroughly into the patch in the next few days...
                
> EliasFanoDocIdSet
> -----------------
>
>                 Key: LUCENE-5084
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5084
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Paul Elschot
>            Assignee: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-5084.patch
>
>
> DocIdSet in Elias-Fano encoding

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to