We have a use case where we are storing event data for a given system and
only want to retain the last N values.  Storing extra values for some time,
as long as it isn’t too long, is fine but never less than N.  We can't use
TTLs to delete the data because we can't be sure how frequently events will
arrive and could end up losing everything.  Is there any built in mechanism
to accomplish this or a known pattern that we can follow?  The events will
be read and written at a pretty high frequency so the solution would have
to be performant and not fragile under stress.



We’ve played with a schema that just has N distinct columns with one value
in each but have found overwrites seem to perform much poorer than wide
rows.  The use case we tested only required we store the most recent value:



CREATE TABLE eventyvalue_overwrite(

    system_name text,

    event_name text,

    event_time timestamp,

    event_value blob,

    PRIMARY KEY (system_name,event_name))



CREATE TABLE eventvalue_widerow (

    system_name text,

    event_name text,

    event_time timestamp,

    event_value blob,

    PRIMARY KEY ((system_name, event_name), event_time))

    WITH CLUSTERING ORDER BY (event_time DESC)



We tested it against the DataStax AMI on EC2 with 6 nodes, replication 3,
write consistency 2, and default settings with a write only workload and
got 190K/s for wide row and 150K/s for overwrite.  Thinking through the
write path it seems the performance should be pretty similar, with probably
smaller sstables for the overwrite schema, can anyone explain the big
difference?



The wide row solution is more complex in that it requires a separate clean
up thread that will handle deleting the extra values.  If that’s the path
we have to follow we’re thinking we’d add a bucket of some sort so that we
can delete an entire partition at a time after copying some values
forward, on the assumption that deleting the whole partition is much better
than deleting some slice of the partition.  Is that true?  Also, is there
any difference between setting a really short ttl and doing a delete?



I know there are a lot of questions in there but we’ve been going back and
forth on this for a while and I’d really appreciate any help you could give.



Thanks,

John

Reply via email to