[jira] [Updated] (LUCENE-7122) BytesRefArray can be more efficient for fixed width values

Michael McCandless (JIRA) Sat, 26 Mar 2016 15:07:37 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Michael McCandless updated LUCENE-7122:
---------------------------------------
    Attachment: LUCENE-7122.patch

OK here's another iteration of the 2nd patch.

I added some more tests, and cleaned up {{BKDWriter}} a bit (the crazy
{{lastWriter}} hack is gone).  I also fixed
{{FixedLengthBytesRefArray}} to manage its own byte blocks (not rely
on the overly complex {{ByteBlockPool}}), and to size its blocks to
always be congruent with the incoming value length so we never have to
copy bytes while sorting.

I tested performance on the first 500 M points from the OpenStreetMaps
export and the patch is ~12% faster to write each segment of ~53
million lat/lon points, with big IW/sorter buffer (1 GB), or ~9%
faster overall index time for the first 1B points.  The gains should
be even better at smaller (e.g. the default) IW/sorter buffers, since
the number of merge passes would often be lower.


> BytesRefArray can be more efficient for fixed width values
> ----------------------------------------------------------
>
>                 Key: LUCENE-7122
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7122
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: master, 6.1
>
>         Attachments: LUCENE-7122.patch, LUCENE-7122.patch, LUCENE-7122.patch
>
>
> Today {{BytesRefArray}} uses one int ({{int[]}}, overallocated) per
> value to hold the length, but for dimensional points these values are
> always the same length. 
> This can save another 4 bytes of heap per indexed dimensional point,
> which is a big improvement (more points can fit in heap at once) for
> 1D and 2D lat/lon points.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (LUCENE-7122) BytesRefArray can be more efficient for fixed width values

Reply via email to