[ 
https://issues.apache.org/jira/browse/CASSANDRA-436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755104#action_12755104
 ] 

Jonathan Ellis commented on CASSANDRA-436:
------------------------------------------

04
    Replace PriorityQueue mess with a CompactionIterator that efficiently 
yields compacted Rows from a set of
    sstables by feeding CollationIterator into a ReducingIterator transform.  
("Efficiently" means we        
    never deserialize data until it is needed, so the number of sstables that 
can be compacted at once is    
    virtually unlimited, and if only one sstable contains a given key that row 
data will be copied over      
    without an intermediate de/serialize step.) This is a very natural fit      
                             
    for the compaction algorithm and almost entirely gets rid of duplicated 
code between doFileCompaction and
    doAntiCompaction.

03
    allow ReducingIterator to reduce from one type to a different one

02
    copy FileStruct to SSTableScanner and remove cruft.  Migrate getKeyRange to 
new scanner class.

01
    minor cleanup


> OOM during major compaction on many (hundreds) of sstables
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-436
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-436
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.5
>
>         Attachments: 0001-CASSANDRA-436-minor-fixes.txt, 
> 0002-copy-FileStruct-to-SSTableScanner-and-remove-cruft.-M.txt, 
> 0003-allow-ReducingIterator-to-reduce-from-one-type-to-a-di.txt, 
> 0004-Replace-PriorityQueue-mess-with-a-CompactionIterator-t.txt
>
>
> compaction deserializes rows during compaction before they are needed, one 
> per sstable.  if we only deserialized on-demand the current algorithm would 
> be fine on nearly arbitrarily large numbers of sstables.  (this is only 
> important b/c it is useful to disable compactions during bulk load.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to