[
https://issues.apache.org/jira/browse/LUCENE-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-6199:
---------------------------------------
Attachment: LUCENE-6199.patch
Initial patch w/ nocommits...
This gives a 2.4x reduction (137 MB to 56 MB) in heap usage in a
simple test that creates 100K indexed fields in a single-segment
index.
I fixed Lucene50FISReader to share a single attributes map if it
notices that multiple fields have exactly the same attributes (the map
is read-only). It would be nice if we could fix this higher up,
e.g. fix PerFieldXXXFormat to not store its attributes if the format
is "the default" somehow.
It also moves some FST fields out to the builder, and adds a new
FST BytesReader impl for when it's a single byte[] page.
Separately, CodecReader.ramBytesUsed is missing some per-field heap:
it reports only 18.8 MB (out of 56 MB) with the patch ... I put some
nocommits for the ones I could find. I'll fix this in the next
iteration.
> Reduce per-field heap usage for indexed fields
> ----------------------------------------------
>
> Key: LUCENE-6199
> URL: https://issues.apache.org/jira/browse/LUCENE-6199
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: Trunk, 5.1
>
> Attachments: LUCENE-6199.patch
>
>
> Lucene uses a non-trivial baseline bytes of heap for each indexed
> field, and I know it's abusive for an app to create 100K indexed
> fields but I still think we can and should make some effort to reduce
> heap usage per unique field?
> E.g. in block tree we store 3 BytesRefs per field, when 3 byte[]s
> would do...
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]