[
https://issues.apache.org/jira/browse/CASSANDRA-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jonathan Ellis updated CASSANDRA-2503:
--------------------------------------
Summary: Eagerly re-write data at read time ("superseding / defragmenting")
(was: Eagerly re-write data at read time ("superseding"))
> Eagerly re-write data at read time ("superseding / defragmenting")
> ------------------------------------------------------------------
>
> Key: CASSANDRA-2503
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2503
> Project: Cassandra
> Issue Type: New Feature
> Components: Core
> Reporter: Stu Hood
> Assignee: Jonathan Ellis
> Labels: compaction, performance
> Fix For: 1.1.0
>
> Attachments: 2503-v2.txt, 2503-v3.txt, 2503.txt
>
>
> Once CASSANDRA-2498 is implemented, it will be possible to implement an
> optimization to eagerly rewrite ("supersede") data at read time. If a
> successful read needed to hit more than a certain threshold of sstables, we
> can eagerly rewrite it in a new sstable, and 2498 will allow only that file
> to be accessed. This basic approach would improve read performance
> considerably, but would cause a lot of duplicate data to be written, and
> would make compaction's work more necessary.
> Augmenting the basic idea, if when we superseded data in a file we marked it
> as superseded somehow, the next compaction that touched that file could
> remove the data. Since our file format is immutable, the values that a
> particular sstable superseded could be recorded in a component of that
> sstable. If we always supersede at the "block" level (as defined by
> CASSANDRA-674 or CASSANDRA-47), then the list of superseded blocks could be
> represented using a generation number and a bitmap of block numbers. Since
> 2498 would already allow for sstables to be eliminated due to timestamps,
> this information would probably only be used at compaction time (by loading
> all superseding information in the system for the sstables that are being
> compacted).
> Initially described on
> [1608|https://issues.apache.org/jira/secure/EditComment!default.jspa?id=12477095&commentId=12920353].
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira