[ 
https://issues.apache.org/jira/browse/CASSANDRA-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453838#comment-13453838
 ] 

Sylvain Lebresne commented on CASSANDRA-4565:
---------------------------------------------

bq. Do you think expired ttl columns should be replaced with tombstones at 
memtable flush?

No, I'm even pretty sure it would be a bad idea. Currently the code does two 
iterations over a row to flush it: first it computes the row serialized size 
(to write that at the beginning of the row), then it actually writes it. We 
should *not* transform expired columns to tombstone during the 2nd iteration 
because it would screw up the serialized size computation. And the first 
iteration is just ill suited too because doing that transformation in the 
serializedSize() method would be a big hack. So we would need to do an 
iteration just for that purpose, and given that having expired column during 
flush is a corner case, it would cost more than it would give us.

If we remove the row serialized size (and column count) in the sstable format 
(which we may at some point), then we can revisit as it will be trivial then.
                
> TTL columns with older then gcgrace do not need to flush
> --------------------------------------------------------
>
>                 Key: CASSANDRA-4565
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4565
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Edward Capriolo
>            Assignee: Aleksey Yeschenko
>             Fix For: 1.3
>
>         Attachments: cassandra-4565.patch.1.txt
>
>
> With memcache many people are willing to sacrifice durability for 
> performance. Cassandra has a TimeToLive feature that can be used in caching 
> scenarios with low values for gc_grace_seconds. However from a code dive it 
> seems that cassandra will always write TTL to disk, even those that are 
> beyond gc_grace_seconds. If a use case very large memtables,small ttl, and 
> small gc_grace it is possible that flushing these columns to disk can be 
> skipped entirely in some scenarios. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to