[jira] [Commented] (CASSANDRA-11206) Support large partitions on the 3.0 sstable format

Robert Stupp (JIRA) Sat, 16 Apr 2016 01:52:22 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-11206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15244100#comment-15244100
 ]


Robert Stupp commented on CASSANDRA-11206:
------------------------------------------

bq. have ColumnIndex but it's been refactored into RowIndexWriter

Yea - it doesn't look the same any more. So I went ahead and moved it into BTW 
since it's the only class from which it's being used. Could move that to 
{{o.a.c.io.sstable.format.big}}, where BTW is.

bq. BTW.addIndexBlock() the indexOffsets\[0\] is always 0

Put some comments in the code for that.

bq. explain in RowIndexEntry.create why you are returning each of the types

Put some comments in the code for that.

bq. don't need indexOffsets once you reach column_index_cache_size_in_kb

It's needed for both cases (shallow and non-shallow RIEs). Put a comment in the 
code for that.

Also ran some cstar tests to compare a version with and without the metrics 
with column_index_cache_size_in_kb 0kB and 2kB on taylor and blade_11_b:
[2kB on 
taylor|http://cstar.datastax.com/tests/id/b4c3dd12-033e-11e6-8db8-0256e416528f] 
[2kB on 
blade_11_b|http://cstar.datastax.com/tests/id/a9c828be-033e-11e6-8db8-0256e416528f]
 [0kB on 
taylor|http://cstar.datastax.com/tests/id/621f0886-034b-11e6-8db8-0256e416528f] 
[0kB on 
blade_11_b|http://cstar.datastax.com/tests/id/6f010ad6-034b-11e6-8db8-0256e416528f]

Commits pushed and CI triggered.

> Support large partitions on the 3.0 sstable format
> --------------------------------------------------
>
>                 Key: CASSANDRA-11206
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11206
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Robert Stupp
>             Fix For: 3.x
>
>         Attachments: 11206-gc.png, trunk-gc.png
>
>
> Cassandra saves a sample of IndexInfo objects that store the offset within 
> each partition of every 64KB (by default) range of rows.  To find a row, we 
> binary search this sample, then scan the partition of the appropriate range.
> The problem is that this scales poorly as partitions grow: on a cache miss, 
> we deserialize the entire set of IndexInfo, which both creates a lot of GC 
> overhead (as noted in CASSANDRA-9754) but is also non-negligible i/o activity 
> (relative to reading a single 64KB row range) as partitions get truly large.
> We introduced an "offset map" in CASSANDRA-10314 that allows us to perform 
> the IndexInfo bsearch while only deserializing IndexInfo that we need to 
> compare against, i.e. log(N) deserializations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11206) Support large partitions on the 3.0 sstable format

Reply via email to