I have a use case where a stream of time series data is coming in. Each item in the stream has a timestamp of when it was sent, and covers the activity that happened within a 5 minute timespan.
I need to group the items together into 30 minute blocks of time. E.g, say I receive the following items: 5:00 PM, 5:05 PM, 5:10 PM... 5:30 PM, 6:20 PM I need to group the messages from 5:00 PM to 5:30 PM into one block, and put the 6:20 PM message into another block. It seems simple enough to do, if for each message, I look up the last received message. If it was within 30 minutes, then the message goes into the current block. Otherwise, a new block is started. My concern is about messages that arrive out of order, or are processed concurrently. Saving and reading them with Consistency=ALL would be bad for performance, and I've had issues where queries have failed due to timeouts with those settings (and timeouts can't be increased on a per query basis). Would it be better to use Redis, or another database, to use as a helper / companion to C*? Or perhaps, all messages should just be stored first, and then ~30 minutes later, a job is run which gets all messages within last 30 mins, sorts them by time, and then sorts them into blocks of time?