Re: What is performance gain of clustering columns

kurt greaves Tue, 03 Oct 2017 14:08:29 -0700

Clustering info is stored in the index of an SSTable, so if you are only
querying a subset of rows within the partition you don't necessarily have
to hit all SSTables, just the SSTables that contain the relevant clustering
col's. They make a big improvement, and can also be used quite effectively
in a time series use case and remove the need for time buckets in your
partition key.


On 3 October 2017 at 15:30, eugene miretsky <eugene.miret...@gmail.com>
wrote:

> Hi,
>
> Clustering columns are used to order the data in a partition. However,
> since data is split into SSTables, the rows are ordered by clustering key
> only within each SSTable. Cassandra still needs to check all SSTables, and
> merge the data if it is found in several SSTables. The only scanario where
> I can imagine big performance gain is  super wide paritions, where each
> partition is within a single SSTable (time series data, where partition
> keys are time-buckets)
>
> Has anybody done benchmarks on that and can share the data mode they have
> used?
>
> Cheers,
> Eugene
>

Re: What is performance gain of clustering columns

Reply via email to