I have a use case where a stream of time series data is coming in.

Each item in the stream has a timestamp of when it was sent, and covers the
activity that happened within a 5 minute timespan.

I need to group the items together into 30 minute blocks of time.

E.g, say I receive the following items:

5:00 PM, 5:05 PM, 5:10 PM... 5:30 PM, 6:20 PM

I need to group the messages from 5:00 PM to 5:30 PM into one block, and
put the 6:20 PM message into another block.

It seems simple enough to do, if for each message, I look up the last
received message. If it was within 30 minutes, then the message goes into
the current block. Otherwise, a new block is started.

My concern is about messages that arrive out of order, or are processed
concurrently.

Saving and reading them with Consistency=ALL would be bad for performance,
and I've had issues where queries have failed due to timeouts with those
settings (and timeouts can't be increased on a per query basis).

Would it be better to use Redis, or another database, to use as a helper /
companion to C*?

Or perhaps, all messages should just be stored first, and then ~30 minutes
later, a job is run which gets all messages within last 30 mins, sorts them
by time, and then sorts them into blocks of time?

Reply via email to