Hi,

We are trying to evaluate read performance impact of having a wide row by
pushing a partition out into clustering column. From all the information I
could gather[1]
<https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_how_cache_works_c.html>
 [2]
<https://docs.datastax.com/en/cassandra/3.x/cassandra/dml/dmlAboutReads.html>
 [3] <https://wiki.apache.org/cassandra/ReadPathForUsers> Key Cache as well
as Partition Index point to Block Location of partition on the disk.

In case if we have a schema like below which would result in a wide table
if pk is of high cardinality (Say Month in a time series data):

CREATE TABLE ks.wide_row_table (
    pk int,
    ck1 bigint,
    ck2 text,
    v1 text,
    v2 text,
    v3 bigint,
    PRIMARY KEY (pk, ck1, ck2)
);

Suppose that a there is only one SSTable for this table at this instance
and specific partition has reached 100MB will reading the first row by
specifying first 0th row in the partition same as the last row in the
partition (At 100 MB).

In other words is there any heuristic to determine the disk offset by
clustering column after partition key is specified to locate to the block
in the disk or in the 2nd case complete 100MB partition will have to be
scanned in order to figure out the relevant row. For simplicity sake lets
assume that Row cache & OS page cache is disabled and all reads are hitting
disk.

Thanks & Regards,
Bhuvan

Reply via email to