[ 
https://issues.apache.org/jira/browse/CASSANDRA-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13189820#comment-13189820
 ] 

Jonathan Ellis commented on CASSANDRA-3743:
-------------------------------------------

There's Arrays.binarySearch.  But do we always know how many positions we have 
before performing the sampling?  If not you could just use AL.trimToSize at the 
end to avoid wasted space.  With that the difference b/t AL and array are 
negligible here.
                
> Lower memory consumption used by index sampling
> -----------------------------------------------
>
>                 Key: CASSANDRA-3743
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3743
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.0.0
>            Reporter: Radim Kolar
>            Assignee: Radim Kolar
>              Labels: optimization
>             Fix For: 1.1
>
>         Attachments: cassandra-3743.txt
>
>
> currently j.o.a.c.io.sstable.indexsummary is implemented as ArrayList of 
> KeyPosition (RowPosition key, long offset)i propose to change it to:
> RowPosition keys[]
> long offsets[]
> and use standard binary search on it. This will lower number of java objects 
> used per entry from 2 (KeyPosition + RowPosition) to 1 (RowPosition).
> For building these arrays convenient ArrayList class can be used and then 
> call to .toArray() on it.
> This is very important because index sampling uses a lot of memory on nodes 
> with billions rows

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to