[jira] [Commented] (CASSANDRA-2503) Eagerly re-write data at read time ("superseding")

Sylvain Lebresne (Commented) (JIRA) Mon, 24 Oct 2011 02:54:05 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133942#comment-13133942
 ]


Sylvain Lebresne commented on CASSANDRA-2503:
---------------------------------------------

On the technical side:
* we probably should skip the commit log (by using Table.apply(rm, false) 
directly).
* what is the reason for limiting this to SizeTieredCompaction?

On the idea itself, I won't hide that I'm less than enthusiastic. It feels to 
me like the wrong fix to the 'compaction is behind' problem. This will be 
basically be triggered when compaction is behind, but is basically solving the 
problem temporarily by adding more pressure on compaction. I'd really like it 
if we could benchmark/evaluate this before adding it because I kind of fear 
there is scenario where it will do more harm than help.
                
> Eagerly re-write data at read time ("superseding")
> --------------------------------------------------
>
>                 Key: CASSANDRA-2503
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2503
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Jonathan Ellis
>              Labels: compaction, performance
>             Fix For: 1.0.1
>
>         Attachments: 2503.txt
>
>
> Once CASSANDRA-2498 is implemented, it will be possible to implement an 
> optimization to eagerly rewrite ("supersede") data at read time. If a 
> successful read needed to hit more than a certain threshold of sstables, we 
> can eagerly rewrite it in a new sstable, and 2498 will allow only that file 
> to be accessed. This basic approach would improve read performance 
> considerably, but would cause a lot of duplicate data to be written, and 
> would make compaction's work more necessary.
> Augmenting the basic idea, if when we superseded data in a file we marked it 
> as superseded somehow, the next compaction that touched that file could 
> remove the data. Since our file format is immutable, the values that a 
> particular sstable superseded could be recorded in a component of that 
> sstable. If we always supersede at the "block" level (as defined by 
> CASSANDRA-674 or CASSANDRA-47), then the list of superseded blocks could be 
> represented using a generation number and a bitmap of block numbers. Since 
> 2498 would already allow for sstables to be eliminated due to timestamps, 
> this information would probably only be used at compaction time (by loading 
> all superseding information in the system for the sstables that are being 
> compacted).
> Initially described on 
> [1608|https://issues.apache.org/jira/secure/EditComment!default.jspa?id=12477095&commentId=12920353].

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2503) Eagerly re-write data at read time ("superseding")

Reply via email to