Allow taking advantage of multiple cores while compacting a single CF
---------------------------------------------------------------------

                 Key: CASSANDRA-2901
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2901
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
            Reporter: Jonathan Ellis
            Priority: Minor


Moved from CASSANDRA-1876:

There are five stages: read, deserialize, merge, serialize, and write. We 
probably want to continue doing read+deserialize and serialize+write together, 
or you waste a lot copying to/from buffers.

So, what I would suggest is: one thread per input sstable doing read + 
deserialize (a row at a time). One thread merging corresponding rows from each 
input sstable. One thread doing serialize + writing the output. This should 
give us between 2x and 3x speedup (depending how much doing the merge on 
another thread than write saves us).

This will require roughly 2x the memory, to allow the reader threads to work 
ahead of the merge stage. (I.e. for each input sstable you will have up to one 
row in a queue waiting to be merged, and the reader thread working on the 
next.) Seems quite reasonable on that front.

Multithreaded compaction should be either on or off. It doesn't make sense to 
try to do things halfway (by doing the reads with a
threadpool whose size you can grow/shrink, for instance): we still have 
compaction threads tuned to low priority, by default, so the impact on the rest 
of the system won't be very different. Nor do we expect to have so many input 
sstables that we lose a lot in context switching between reader threads. (If 
this is a concern, we already have a tunable to limit the number of sstables 
merged at a time in a single CF.)

IMO it's acceptable to punt completely on rows that are larger than memory, and 
fall back to the old non-parallel code there. I don't see any sane way to 
parallelize large-row compactions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to