[ 
https://issues.apache.org/jira/browse/LUCENE-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203913#comment-15203913
 ] 

Michael McCandless commented on LUCENE-7122:
--------------------------------------------

bq. if it's really a gain worth the short

Hmm, to clarify here: I'm not trying to optimize away that on-disk short (2 
bytes) here.  We could do that, later.

I'm trying to optimize away the in-RAM int (4 bytes) that {{OfflineSorter}} 
(because of {{BytesRefArray}}) now uses when sorting each in-heap partition.

I do think this is an important/worthwhile optimization:

E.g. if you are indexing 1D {{IntPoint}} s, which I suspect is a common case, 
today we need 12 bytes per value, and with this patch, 8 bytes per value, which 
means {{OfflineSorter}} can sort more values in heap before it must spill to 
disk, can create larger initial segments, so it can index more values before 
requiring 2nd level merges, etc.

The gains are still sizable for the 2D cases, e.g. a {{LatLonPoint}} would only 
need 12 bytes per value vs the 16 bytes today.


> BytesRefArray can be more efficient for fixed width values
> ----------------------------------------------------------
>
>                 Key: LUCENE-7122
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7122
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: master, 6.1
>
>         Attachments: LUCENE-7122.patch
>
>
> Today {{BytesRefArray}} uses one int ({{int[]}}, overallocated) per
> value to hold the length, but for dimensional points these values are
> always the same length. 
> This can save another 4 bytes of heap per indexed dimensional point,
> which is a big improvement (more points can fit in heap at once) for
> 1D and 2D lat/lon points.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to