[ 
https://issues.apache.org/jira/browse/LUCENE-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-5373:
---------------------------------

    Attachment: LUCENE-5373.patch

Here is a patch. Lucene42DocValuesProducer no more relies on 
{{RamUsageEstimator.sizeOf(Object)}} but instead has a member that stores its 
memory usage which is incremented every time we load doc values on a new field. 
This should be both faster and more accurate.

I didn't take into account object alignment, the numeric/binary/fst entries and 
the size of some small hash tables on purpose to keep size estimation simple. 
These should be very small compared to the structures that actually store doc 
values anyway.

> Lucene42DocValuesProducer.ramBytesUsed is over-estimated
> --------------------------------------------------------
>
>                 Key: LUCENE-5373
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5373
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-5373.patch
>
>
> Lucene42DocValuesProducer.ramBytesUsed uses 
> {{RamUsageEstimator.sizeOf(this)}} to return an estimation of the memory 
> usage. One of the issues (there might be other ones) is that this class has a 
> reference to an IndexInput that might link to other data-structures that we 
> wouldn't want to take into account. For example, index inputs of a 
> {{RAMDirectory}} all point to the directory itself, so 
> {{Lucene42DocValuesProducer.ramBytesUsed}} would return the amount of memory 
> used by the whole directory.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to