Hi, we built a simple system to migrate live cassandra data to other
databases, mainly by using these queries:

1. SELECT DISTINCT TOKEN(partition_key) FROM table WHERE
TOKEN(partition_key) > current_offset AND TOKEN(partition_key) <=
upper_bound LIMIT token_fetch_size
2. Any cql query that retrieves all rows, given a set of tokens

And we observed that the "SELECT DISTINCT TOKEN" query takes way longer
when the table is wide partitioned (about 200+ rows on average), look like
the underlying operation is not linear.

Is it that the query would scan every rows of every partitions found until
token_fetch_size is met? Or is it due to some low-level operations that are
naturally more time consuming when dealing with wide partitioned data?

Any advice on this question or where to find the concerning code would be
appreciated.

Reply via email to