[ 
https://issues.apache.org/jira/browse/CASSANDRA-11206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193082#comment-15193082
 ] 

Stefania commented on CASSANDRA-11206:
--------------------------------------

bq. IndexInfo is also used from 
{{UnfilteredRowIteratorWithLowerBound#getPartitionIndexLowerBound}} 
(CASSANDRA-8180) - not sure whether it's worth to deserialize the index for 
this functionality, *as it is currently restricted to the entries that are 
present in the key cache*. I tend to remove this access. 

If I am not mistaken when the sstable iterator is created, the partition should 
be added to the key cache if not already present. Please have a look at 
BigTableReader {{iterator()}} and {{getPosition()}} to confirm. The reason we 
need the index info is that the lower bounds in the sstable metatdata do not 
work for tombstones. This is the only lower bound we have for tombstones. If 
it's removed then the optimization of CASSANDRA-8180 no longer works in the 
presence of tombstones (whether this is acceptable is up for discussion). 

Can't we add the partition bounds to the offset map? 

For completeness, I also add that we don't necessarily need a lower bound for 
the partion, it can be a lower bound for the entire sstable if easier. However 
it should work for tombstones, that is it should be an instance of 
{{ClusteringPrefix}} rather than an array of {{ByteBuffer}} as it is currently 
stored in the sstable metadata. 

> Support large partitions on the 3.0 sstable format
> --------------------------------------------------
>
>                 Key: CASSANDRA-11206
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11206
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Robert Stupp
>             Fix For: 3.x
>
>
> Cassandra saves a sample of IndexInfo objects that store the offset within 
> each partition of every 64KB (by default) range of rows.  To find a row, we 
> binary search this sample, then scan the partition of the appropriate range.
> The problem is that this scales poorly as partitions grow: on a cache miss, 
> we deserialize the entire set of IndexInfo, which both creates a lot of GC 
> overhead (as noted in CASSANDRA-9754) but is also non-negligible i/o activity 
> (relative to reading a single 64KB row range) as partitions get truly large.
> We introduced an "offset map" in CASSANDRA-10314 that allows us to perform 
> the IndexInfo bsearch while only deserializing IndexInfo that we need to 
> compare against, i.e. log(N) deserializations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to