Hi all,

We use Cassandra as a primary operational database in our data platform.
One of the cases - store the latest user profiles information that is
sourced from the upstream system. Importantly, we get each time the entire
row, not particular columns that were updated.

The access pattern is the following:
- writes: infrequent (~5000 upserts each 5 minutes) "overwrites" of the
entire rows
- reads: more frequent (30K reads each 10 seconds) joins by id

The table itself is a wide row (180 fields), relatively sparse table (350M
users table occupy 200 GB of disk space).

It looks to me we can leverage somehow the fact that each mutation of the
row encapsulates the current state of a user. That is - on the storage
level, maintain SSTables in "historical order" and during the read phase
find *first* occurrence of the row going from "most recent" SSTables.

Maybe same guarantees can be provided with DateTired compaction strategy,
but I'm not sure it's quite what we need, because there is no strict
correlation between updates and reads.

Thanks,
Andrii

Reply via email to