Wondering how cql3 DISTINCT query is implemented

Jing Meng Mon, 22 Oct 2018 06:02:58 -0700

Hi, we built a simple system to migrate live cassandra data to other
databases, mainly by using these queries:


1. SELECT DISTINCT TOKEN(partition_key) FROM table WHERE
TOKEN(partition_key) > current_offset AND TOKEN(partition_key) <=
upper_bound LIMIT token_fetch_size
2. Any cql query that retrieves all rows, given a set of tokens

And we observed that the "SELECT DISTINCT TOKEN" query takes way longer
when the table is wide partitioned (about 200+ rows on average), look like
the underlying operation is not linear.

Is it that the query would scan every rows of every partitions found until
token_fetch_size is met? Or is it due to some low-level operations that are
naturally more time consuming when dealing with wide partitioned data?

Any advice on this question or where to find the concerning code would be
appreciated.

Wondering how cql3 DISTINCT query is implemented

Reply via email to