[ 
https://issues.apache.org/jira/browse/LUCENE-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14038674#comment-14038674
 ] 

Robert Muir commented on LUCENE-5780:
-------------------------------------

This looks good (+1 to commit to trunk/4.10) but i think we can do better, by 
explicitly sorting? E.g. take a long[] sizes parameter (can be optional and 
zeros would give us what we have today if we use a stable sort), that the user 
could populate either with valueCount or number of docs in the segment (both 
are probably a fine heuristic).

I know this means we will need an array to remap lookups, but this only happens 
once per segment with the new LongValues api so it won't impact performance.

> OrdinalMap's mapping from global ords to segment ords is sometimes wasteful
> ---------------------------------------------------------------------------
>
>                 Key: LUCENE-5780
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5780
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>             Fix For: 5.0, 4.10
>
>         Attachments: LUCENE-5780.patch, LUCENE-5780.patch
>
>
> Robert found a case when the ordinal map can be quite wasteful in terms of 
> memory usage: in order to be able to resolve values given a global ordinals, 
> it stores two things:
>  - an identifier of the segment where the value is
>  - the difference between the ordinal on the segment and the global ordinal
> The issue is that OrdinalMap currently picks any of the segments that contain 
> the value but we can do better: we can pick the first segment that has the 
> value. This will help for two reasons:
>  - it will potentially require fewer bits per value to store the segment ids 
> if NRT segments don't introduce new values
>  - if all values happen to appear in the first segment, then the map from 
> global ords to deltas only stores zeros.
> I just tested on an index where all values are in the first segment and this 
> helped reduce memory usage of the ordinal map by 4x (from 3.5MB to 800KB).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to