[
https://issues.apache.org/jira/browse/LUCENE-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-7122:
---------------------------------------
Attachment: LUCENE-7122.patch
OK here's another iteration of the 2nd patch.
I added some more tests, and cleaned up {{BKDWriter}} a bit (the crazy
{{lastWriter}} hack is gone). I also fixed
{{FixedLengthBytesRefArray}} to manage its own byte blocks (not rely
on the overly complex {{ByteBlockPool}}), and to size its blocks to
always be congruent with the incoming value length so we never have to
copy bytes while sorting.
I tested performance on the first 500 M points from the OpenStreetMaps
export and the patch is ~12% faster to write each segment of ~53
million lat/lon points, with big IW/sorter buffer (1 GB), or ~9%
faster overall index time for the first 1B points. The gains should
be even better at smaller (e.g. the default) IW/sorter buffers, since
the number of merge passes would often be lower.
> BytesRefArray can be more efficient for fixed width values
> ----------------------------------------------------------
>
> Key: LUCENE-7122
> URL: https://issues.apache.org/jira/browse/LUCENE-7122
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: master, 6.1
>
> Attachments: LUCENE-7122.patch, LUCENE-7122.patch, LUCENE-7122.patch
>
>
> Today {{BytesRefArray}} uses one int ({{int[]}}, overallocated) per
> value to hold the length, but for dimensional points these values are
> always the same length.
> This can save another 4 bytes of heap per indexed dimensional point,
> which is a big improvement (more points can fit in heap at once) for
> 1D and 2D lat/lon points.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]