[
https://issues.apache.org/jira/browse/LUCENE-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13486311#comment-13486311
]
Robert Muir commented on LUCENE-4512:
-------------------------------------
I do think we should use n=(some power of 2 or whatever) chunks, because e.g.
just testing with that geonames dataset i saw the
deltas grow quite large at points... this caused it to use 24 bits per value
(still better than 29), but with a tiny bit of
effort I think it could be significantly less.
> Additional memory savings in CompressingStoredFieldsIndex.MEMORY_CHUNK
> ----------------------------------------------------------------------
>
> Key: LUCENE-4512
> URL: https://issues.apache.org/jira/browse/LUCENE-4512
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Assignee: Adrien Grand
> Priority: Minor
> Fix For: 4.1
>
>
> Robert had a great idea to save memory with
> {{CompressingStoredFieldsIndex.MEMORY_CHUNK}}: instead of storing the
> absolute start pointers we could compute the mean number of bytes per chunk
> of documents and only store the delta between the actual value and the
> expected value (avgChunkBytes * chunkNumber).
> By applying this idea to every n(=1024?) chunks, we would even:
> - make sure to never hit the worst case (delta ~= maxStartPointer)
> - reduce memory usage at indexing time.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]