Hi all, We use Cassandra as a primary operational database in our data platform. One of the cases - store the latest user profiles information that is sourced from the upstream system. Importantly, we get each time the entire row, not particular columns that were updated.
The access pattern is the following: - writes: infrequent (~5000 upserts each 5 minutes) "overwrites" of the entire rows - reads: more frequent (30K reads each 10 seconds) joins by id The table itself is a wide row (180 fields), relatively sparse table (350M users table occupy 200 GB of disk space). It looks to me we can leverage somehow the fact that each mutation of the row encapsulates the current state of a user. That is - on the storage level, maintain SSTables in "historical order" and during the read phase find *first* occurrence of the row going from "most recent" SSTables. Maybe same guarantees can be provided with DateTired compaction strategy, but I'm not sure it's quite what we need, because there is no strict correlation between updates and reads. Thanks, Andrii