I have concern over using secondary index on field with low cardinality.
Lets say I have few billion rows and each row can be classified in 1000
category. Lets say we have 50 node cluster.

Now we want to fetch data for a single category using secondary index over
a category. And query is paginated too with fetch size property say 5000.

Since query on secondary index works as scatter and gatherer approach by
coordinator node. Would it lead to out of memory on coordinator or timeout
errors too much.

How does pagination (token level data fetch) behave in scatter and gatherer
approach?

Secondly, What If we create an inverted table with partition key as
category. Then this will led to lots of data on single node. Then it might
led to hot shard issue and performance issue of data fetching from single
node as a single partition has  millions of rows.

How should we tackle such low cardinality index in Cassandra?

Thanks
---------------------------------------------------------------------------------------------------------------------
Atul Saroha
*Lead Software Engineer*

Plot # 362, ASF Centre - Tower A, Udyog Vihar,
 Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA

Reply via email to