[ 
https://issues.apache.org/jira/browse/LUCENE-5293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13799941#comment-13799941
 ] 

Paul Elschot commented on LUCENE-5293:
--------------------------------------

Looking again at the benchmark on how to solve building an EF docidset without 
knowing the number of values in advance, one solution would be to use a PFD 
docidset for that because it builds quickly and it has good next() performance. 
The next() will be used once through the set to build the final docidset to be 
cached.

However an even better way might be to use one or more temporary long arrays to 
store the incoming doc ids directly in FOR format, (without forming deltas and 
without an index). This can be done because the maximum doc id value is known. 
While storing the doc ids, one can switch to an FBS on the fly when the total 
number of doc ids becomes too high. The existing PackedInts code should be a 
nice fit for this.
Since allocating the long arrays takes time, one can start with one array of 
say 1/512 of the maximum needed size, and continue into another (bigger) array 
as long as necessary or until an FBS is preferable.


> Also use EliasFanoDocIdSet in CachingWrapperFilter
> --------------------------------------------------
>
>                 Key: LUCENE-5293
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5293
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search
>            Reporter: Paul Elschot
>            Priority: Minor
>         Attachments: LUCENE-5293.patch, LUCENE-5293.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to