Clustering info is stored in the index of an SSTable, so if you are only querying a subset of rows within the partition you don't necessarily have to hit all SSTables, just the SSTables that contain the relevant clustering col's. They make a big improvement, and can also be used quite effectively in a time series use case and remove the need for time buckets in your partition key.
On 3 October 2017 at 15:30, eugene miretsky <eugene.miret...@gmail.com> wrote: > Hi, > > Clustering columns are used to order the data in a partition. However, > since data is split into SSTables, the rows are ordered by clustering key > only within each SSTable. Cassandra still needs to check all SSTables, and > merge the data if it is found in several SSTables. The only scanario where > I can imagine big performance gain is super wide paritions, where each > partition is within a single SSTable (time series data, where partition > keys are time-buckets) > > Has anybody done benchmarks on that and can share the data mode they have > used? > > Cheers, > Eugene >