[ 
https://issues.apache.org/jira/browse/CASSANDRA-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-4478:
----------------------------------------

    Attachment: 4478-incomplete.txt

I'll note that changing IndexSummary to consider a byte size instead of number 
of keys is relatively straightforward. I'm attaching an incomplete patch that 
does that part.

However, one problem is that we currently use the index summary for different 
estimate of number of keys in the sstable. And in particular, we need to 
estimate the number of keys given a range of tokens, which means simply keeping 
the total number of keys in the sstable is not enough.

The simplest/cheapest solution I can see for that problem would be to add to 
the IndexSummary a new int[] to keep how many key each sample covers (since 
it's not constant anymore). That does mean breaking the format of the 
serialized indexSummary however, but that may in turn be fine if we get this in 
1.2 (since index summary aren't save before that). If someone feels like 
completing the attached patch with that idea, feel free to (I can find other 
ways to entertain myself).
                
> Make index_interval be measured in kb (instead of number of keys)
> -----------------------------------------------------------------
>
>                 Key: CASSANDRA-4478
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4478
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 4478-incomplete.txt
>
>
> Currently, index_interval is measured in number of keys: how may keys before 
> adding an entry to the index summary. After CASSANDRA-2319, each index entry 
> also contains the columns index for the row, so index entry can be a bit 
> bigger and of differing sizes. Measuring in number of keys is thus 
> sub-optimal and difficult to tune, since you might want a different setting 
> depending of whether your rows are big or small, but the setting is global.
> So we should move to measuring the interval in bytes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to