Grouping time series data into blocks of times

Ali Akhtar Sat, 18 Mar 2017 10:28:14 -0700

I have a use case where a stream of time series data is coming in.

Each item in the stream has a timestamp of when it was sent, and covers the
activity that happened within a 5 minute timespan.


I need to group the items together into 30 minute blocks of time.

E.g, say I receive the following items:

5:00 PM, 5:05 PM, 5:10 PM... 5:30 PM, 6:20 PM

I need to group the messages from 5:00 PM to 5:30 PM into one block, and
put the 6:20 PM message into another block.

It seems simple enough to do, if for each message, I look up the last
received message. If it was within 30 minutes, then the message goes into
the current block. Otherwise, a new block is started.

My concern is about messages that arrive out of order, or are processed
concurrently.

Saving and reading them with Consistency=ALL would be bad for performance,
and I've had issues where queries have failed due to timeouts with those
settings (and timeouts can't be increased on a per query basis).

Would it be better to use Redis, or another database, to use as a helper /
companion to C*?

Or perhaps, all messages should just be stored first, and then ~30 minutes
later, a job is run which gets all messages within last 30 mins, sorts them
by time, and then sorts them into blocks of time?

Grouping time series data into blocks of times

Reply via email to