[jira] [Commented] (CASSANDRA-9779) Append-only optimization

Robert Stupp (JIRA) Sun, 12 Jul 2015 03:19:41 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-9779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623732#comment-14623732
 ]


Robert Stupp commented on CASSANDRA-9779:
-----------------------------------------

IMO it would be logical to disallow {{UPDATE}} for {{WITH INSERTS ONLY}} tables 
(and that's what {{with INSERTs only}} says).

Would {{WITH INSERTS ONLY}} mean to also restrict to primary-keys without 
clustering-key?
Maybe I didn't completely get it. What I'm thinking about is that one partition 
can still be split over memtable + multiple sstables - which would conflict 
with the compaction/read-path optimizations. For example, if you have a table 
with {{PRIMARY KEY ( (year, month, day), hour, minute, second)}} with several 
millions INSERTs per day, it's likely that this will result in multiple 
sstables per day. Mean - I'm a bit afraid that partitions get too tiny with all 
its consequences (too many queries, not able to insert from different clients 
for the same day).

If such a {{WITH INSERTS ONLY}} table has no clustering-key, even more 
optimizations might be possible (key-cache key would not need the sstable ref 
in the key, but in the value - so we could do the key-cache lookup and skip 
bloom-filter lookup on hit).

> Append-only optimization
> ------------------------
>
>                 Key: CASSANDRA-9779
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9779
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API, Core
>            Reporter: Jonathan Ellis
>             Fix For: 3.x
>
>
> Many common workloads are append-only: that is, they insert new rows but do 
> not update existing ones.  However, Cassandra has no way to infer this and so 
> it must treat all tables as if they may experience updates in the future.
> If we added syntax to tell Cassandra about this ({{WITH INSERTS ONLY}} for 
> instance) then we could do a number of optimizations:
> - Compaction would only need to worry about defragmenting partitions, not 
> rows.  We could default to DTCS or similar.
> - CollationController could stop scanning sstables as soon as it finds a 
> matching row
> - Most importantly, materialized views wouldn't need to worry about deleting 
> prior values, which would eliminate the majority of the MV overhead



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9779) Append-only optimization

Reply via email to