[
https://issues.apache.org/jira/browse/LUCENE-6325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-6325:
---------------------------------------
Attachment: LUCENE-6325.patch
New patch, using dense array when > 1/16th of the numbers are used:
Each TreeMap$Entry has object header (8 or 16 bytes), 5 pointers (4 or
8 bytes), and a boolean (likely rounded up to 4 bytes), times 2 for
all the inner nodes of the tree, plus the overhead of Integer (object
header, int), so net/net each entry in the TreeMap costs 68 - 124 bytes.
The array is 4 or 8 bytes per int.
> improve perf and memory of FieldInfos.fieldInfo(int)
> ----------------------------------------------------
>
> Key: LUCENE-6325
> URL: https://issues.apache.org/jira/browse/LUCENE-6325
> Project: Lucene - Core
> Issue Type: Sub-task
> Reporter: Robert Muir
> Assignee: Michael McCandless
> Fix For: 5.2, Trunk
>
> Attachments: LUCENE-6325.patch, LUCENE-6325.patch
>
>
> FieldInfos.fieldInfo(int) looks up a field by number and returns its
> FieldInfo.
> This method is called per-field-per-doc in things like stored fields and
> vectors readers.
> Unfortunately, today this method is always backed by a TreeMap. In most cases
> a simple array is better, its faster and uses less memory.
> These changes made significant difference in stored fields checkindex time
> with my test index (had only 10 fields). Maybe it helps merge as well.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]