Jason Kania created CASSANDRA-11319:
---------------------------------------

             Summary: SELECT DISTINCT Should allow filtering by where clause to 
support time series
                 Key: CASSANDRA-11319
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11319
             Project: Cassandra
          Issue Type: Improvement
          Components: CQL
         Environment: Cassandra 3.0.3
            Reporter: Jason Kania


Due to the lack of built in sharding, we have been trying to split very wide 
rows. However in trying to find all the sharding column values after the fact, 
we have not been able to find a solution that is manageable. If a table is 
defined as follows:

CREATE TABLE IF NOT EXISTS "sensorReadings"
(
        "measurementList" blob,
        "sensorId" int,
        "sensorUnitId" int,
        "shardId" int,
        "time" timestamp,
        PRIMARY KEY ( ("sensorUnitId", "sensorId", "shardId"), "time" )
);

then

select DISTINCT "sensorUnitId","sensorId","shardId" from "sensorReadings";

will give all the unique partition keys but this can still be a very large set 
and so it should be possible to refine this with a where clause that contains 
only partition columns ie:

select DISTINCT "sensorUnitId","sensorId","shardId" from "sensorReadings" WHERE 
"sensorUnitId"='sensor17' AND "sensorId"=8;

Without this ability, we am forced to keep a table with available shardIds and 
update on every write so that we can even query the original table. While 
several scenarios allow the shardId to be determined automatically, attempts to 
iterate over the shards are seriously hampered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to