[
https://issues.apache.org/jira/browse/LUCENE-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487084#comment-13487084
]
Robert Muir commented on LUCENE-4512:
-------------------------------------
I tested this really fast on that geonames data again: 72 chunks with bpvs of
16-20 (avg 18 i think).
So this is quite a bit more savings than 29bpv with the trunk code.
I didnt look at the code too much, but since we are computing the average at
index-time (i think?),
do you think it still makes sense to encode the deltas from the previous value,
or should we just
up-front encode them at index-time as deltas from the average (if it makes
things simpler?)
> Additional memory savings in CompressingStoredFieldsIndex.MEMORY_CHUNK
> ----------------------------------------------------------------------
>
> Key: LUCENE-4512
> URL: https://issues.apache.org/jira/browse/LUCENE-4512
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Assignee: Adrien Grand
> Priority: Minor
> Fix For: 4.1
>
> Attachments: LUCENE-4512.patch
>
>
> Robert had a great idea to save memory with
> {{CompressingStoredFieldsIndex.MEMORY_CHUNK}}: instead of storing the
> absolute start pointers we could compute the mean number of bytes per chunk
> of documents and only store the delta between the actual value and the
> expected value (avgChunkBytes * chunkNumber).
> By applying this idea to every n(=1024?) chunks, we would even:
> - make sure to never hit the worst case (delta ~= maxStartPointer)
> - reduce memory usage at indexing time.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]