Hey everyone,

I'm new to Cassandra, going my first steps, having a problem/question regarding 
sorting results and proper data modelling.

First of all, I read the article "We Shall Have Order!" by Aaron Ploetz (1) to 
get a first view on how Cassandra works.
I reproduced the example in the article with my own table.

DROP TABLE sensors;
CREATE TABLE sensors (
    timestamp BIGINT,
    name VARCHAR,
    value VARCHAR,
    unit VARCHAR,
    PRIMARY KEY (name, timestamp)
)
WITH gc_grace_seconds = 0
AND CLUSTERING ORDER BY (timestamp DESC);

I'm actual running Cassandra on a single node ([cqlsh 5.0.1 | Cassandra 3.11.0 
| CQL spec 3.4.4 | Native protocol v4]).
Now some background information about my project:

I want to store all kinds of measuring-data from all kinds sensors. No matter 
if the sensor is measuring a temperature, water flow, or whatever. Sensors 
always give a single value. Interpretation has to be done afterwards by the 
user.
So in my example, I 'm measuring temperatures of my house which leads me to the 
following data:

timestamp               name              value unit
2017-07-24 14-11-00     entrance-a        20    Celsius
2017-07-24 14-11-04     living-room       24    Celsius
2017-07-24 14-11-07     bath-room         22    Celsius
2017-07-24 14-11-15     bed-room          23    Celsius
2017-07-24 14-11-22     entrance-b        20    Celsius

I'm measuring time-triggered each 15 minutes. In order to have some kind of 
start and end for each process, I decided to measure the entrance twice with 
different named sensors (entrance a and b). So above is one set of 
measuring-data, created by a single process.
I'd say this is just another perfect example of what Aaron Ploetz describes in 
his article.

When I query Cassandra the result set is not sorted by timestamp as long as I 
won't use the primary key in my WHERE clause.
When I ask myself: "What will I query Cassandra for?" I'm always coming up with 
the same typical thoughts:

*         LIST all measuring's in a specific timespan ORDERED BY timestamp 
ASC/DESC

o   Requires ALLOW FILTERING

o   Won't be sorted

*         LIST all measuring's for a specific sensor ORDERED BY timestamp 
ASC/DESC

o   Sorted result. OK.

*         And stuff the future will bring which I simply don't know now.


So in order to query Cassandra for measuring's in a specific timestamp I can't 
find a solid solution. My first idea was:

*         Add a column sequence which can be used to bundle a set of measuring's

DROP TABLE sensors;
CREATE TABLE sensors (
    timestamp BIGINT,
    name VARCHAR,
    value VARCHAR,
    unit VARCHAR,
    sequence INT,
    PRIMARY KEY (sequence, timestamp)
)
WITH gc_grace_seconds = 0
AND CLUSTERING ORDER BY (timestamp DESC);


o   I won't need to measure the entrance twice

o   I can query for a timespan as long as the timespan is within a sequence.

?  But when I query a timespan containing more than a single sequence, then the 
result set is not correct sorted again

sequence timestamp           name              value unit
123   2017-07-24 14-11-22     entrance-b        20    Celsius
123   2017-07-24 14-11-15     bed-room          23    Celsius
123   2017-07-24 14-11-07     bath-room         22    Celsius
123   2017-07-24 14-11-04     living-room       24    Celsius
123   2017-07-24 14-11-00     entrance-a        20    Celsius
124   2017-07-24 15-11-22     entrance-b        22    Celsius
124   2017-07-24 15-11-15     bed-room          25    Celsius
124   2017-07-24 15-11-07     bath-room         24    Celsius
124   2017-07-24 15-11-04     living-room       26    Celsius
124   2017-07-24 15-11-00     entrance-a        22    Celsius


o   Besides: it's not recommended to use a "dummy" column especially not as 
primary or clustering key.

How to solve this problem?
I believe, I can't be the only one who got this requirement. Imho "Sort it on 
the client-side" can't be the solution. As soon as data gets bigger we simply 
can't "just" sort on a client side.
So my next idea was to use the table as overall data storage and create another 
table and periodically transfer data from the main to the child table. But I 
believe I'll get the same problem because Cassandra simply don't sort as an 
RDBMS. So here must be an idea behind the philosophy of Cassandra.

Can anyone help me out?

Best regards
Mike Wenzel


(1)    https://www.datastax.com/dev/blog/we-shall-have-order

Reply via email to