[jira] [Commented] (CASSANDRA-1608) Redesigned Compaction

Benjamin Coverston (JIRA) Fri, 24 Jun 2011 11:10:10 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054592#comment-13054592
 ]


Benjamin Coverston commented on CASSANDRA-1608:
-----------------------------------------------

It won't because you can't do a simple binary search for a range, it's really a 
problem of intersection rather than matching, and comparators alone don't solve 
problem of: give me all the intersecting ranges for this set without having to 
compare every range for intersection.

Nearly every interval intersection algorithm depends on tree traversal, and 
while many of the existing collections are based on binary, or red-black trees 
they don't expose the methods necessary for traversal, only the comparator is 
exposed used to build the tree and the models only expose either "iterate over 
everything" or "search for the thing I want".

> Redesigned Compaction
> ---------------------
>
>                 Key: CASSANDRA-1608
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1608
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Chris Goffinet
>            Assignee: Benjamin Coverston
>         Attachments: 0001-leveldb-style-compaction.patch, 1608-v2.txt, 
> 1608-v3.txt, 1608-v4.txt, 1608-v5.txt
>
>
> After seeing the I/O issues in CASSANDRA-1470, I've been doing some more 
> thinking on this subject that I wanted to lay out.
> I propose we redo the concept of how compaction works in Cassandra. At the 
> moment, compaction is kicked off based on a write access pattern, not read 
> access pattern. In most cases, you want the opposite. You want to be able to 
> track how well each SSTable is performing in the system. If we were to keep 
> statistics in-memory of each SSTable, prioritize them based on most accessed, 
> and bloom filter hit/miss ratios, we could intelligently group sstables that 
> are being read most often and schedule them for compaction. We could also 
> schedule lower priority maintenance on SSTable's not often accessed.
> I also propose we limit the size of each SSTable to a fix sized, that gives 
> us the ability to  better utilize our bloom filters in a predictable manner. 
> At the moment after a certain size, the bloom filters become less reliable. 
> This would also allow us to group data most accessed. Currently the size of 
> an SSTable can grow to a point where large portions of the data might not 
> actually be accessed as often.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1608) Redesigned Compaction

Reply via email to