[
https://issues.apache.org/jira/browse/LUCENE-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adrien Grand updated LUCENE-5773:
---------------------------------
Attachment: LUCENE-5773.patch
Here is a patch. It compares the output of {{SegmentReader.ramBytesUsed}}
against {{RamUsageTester}} for various codecs. In order to be successful the
error needs to be either under 10% (relative) or 500 bytes (absolute) on an
index on 100k documents with random small fields. The absolute value is needed
for things that consume very little memory like Lucene's 4.9 norms with
constant compression or stored fields. Otherwise it would very easily fail due
to the constant overhead of the objects that we maintain to make SegmentReader
work.
I had to refactor {{RamUsageTester}} a bit to make it work. In particular, I
needed to make sure that pointers to other segments and to directory objects
are not followed. Otherwise this would count eg. the buffers of the NIO
directory's buffers.
It found a couple of interesting bugs although the default codec had pretty
accurate estimations. Quick overview of things that have been fixed and/or are
surprising:
- PagedBytes.Reader assumed all pages had the same size. However with
trim=true the last page is trimmed so the estimation could be quite far from
accurate with large page sizes. It now returns the exact memory usage (as
reported by RamUsageTester).
- The various FSTs that we use in codecs sometimes have massive cached root
arcs, MemoryPostingsFormat in particular but that was also the case for
BlockTreeTermsReader (or maybe is it due to the test data?).
- Other bugs were mostly about forgotten references, of things counted twice
(eg. a paged bytes and a reader to the same pages).
> Test SegmentReader.ramBytesUsed
> -------------------------------
>
> Key: LUCENE-5773
> URL: https://issues.apache.org/jira/browse/LUCENE-5773
> Project: Lucene - Core
> Issue Type: Test
> Reporter: Adrien Grand
> Assignee: Adrien Grand
> Priority: Minor
> Attachments: LUCENE-5773.patch
>
>
> There have been cases where the memory reported by this API was larger than
> the JVM heap size in the past so we should try to add some basic tests to it.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]