[jira] [Updated] (LUCENE-4512) Additional memory savings in CompressingStoredFieldsIndex.MEMORY_CHUNK

Adrien Grand (JIRA) Mon, 29 Oct 2012 07:38:13 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Adrien Grand updated LUCENE-4512:
---------------------------------

    Description: 
Robert had a great idea to save memory with 
{{CompressingStoredFieldsIndex.MEMORY_CHUNK}}: instead of storing the absolute 
start pointers we could compute the mean number of bytes per chunk of documents 
and only store the delta between the actual value and the expected value 
(avgChunkBytes * chunkNumber).

By applying this idea to every n(=1024?) chunks, we would even:
 - make sure to never hit the worst case (delta ~= maxStartPointer)
 - reduce memory usage at indexing time.

  was:
Robert had a great idea to save memory with 
{{CompressingStoredFieldsIndex.MEMORY_CHUNK}}: instead of storing the absolute 
start pointers we could compute the mean number of bytes per chunk of documents 
and only store the delta between the actual value and the expected value 
(avgChunkBytes * chunkNumber).

Given that the list of start pointers is stricly increasing, the error is at 
most maxStartPointer / 2 (and is very likely to be much lower) so we are 
guaranteed to save memory. (The same principle could be applied to docBases.)

By applying this idea to every n(=1024?) chunks, we would even:
 - make sure to never hit the worst case (same memory usage as if we stored the 
absolute offsets)
 - reduce memory usage at indexing time.

    
> Additional memory savings in CompressingStoredFieldsIndex.MEMORY_CHUNK
> ----------------------------------------------------------------------
>
>                 Key: LUCENE-4512
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4512
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>             Fix For: 4.1
>
>
> Robert had a great idea to save memory with 
> {{CompressingStoredFieldsIndex.MEMORY_CHUNK}}: instead of storing the 
> absolute start pointers we could compute the mean number of bytes per chunk 
> of documents and only store the delta between the actual value and the 
> expected value (avgChunkBytes * chunkNumber).
> By applying this idea to every n(=1024?) chunks, we would even:
>  - make sure to never hit the worst case (delta ~= maxStartPointer)
>  - reduce memory usage at indexing time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (LUCENE-4512) Additional memory savings in CompressingStoredFieldsIndex.MEMORY_CHUNK

Reply via email to