[ 
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13073477#comment-13073477
 ] 

Sylvain Lebresne commented on CASSANDRA-2901:
---------------------------------------------

Comments:
* PCI.Reducer.getCompactedRow unwraps NotifyingSSTableIterators, so their 
close() function won't be called (as a side note, it doesn't seem like we ever 
call close() on the SSTableIdentityIterator).
* The MergeTask executor has a bounded queue (and number of threads), so tasks 
can be rejected. If we want submitters to block when the queue is full and all 
threads are occupied, we need to reuse the trick of 
DebuggableThreadPoolExecutor.
* Deserializer uses a queue of size 1 to queue up to 1 row while it deserialize 
the next one. However, we already queue up rows in the MergeTask executor, so 
it feels like it would be simple to use direct handoff here. It would make it 
easier to reason about how many rows are in memory at any given time for 
instance.
* More generally, the memory blow up is (potentially) much more than the 2x 
(compared to mono-threaded) in the description of this ticket. I think that 
right now we may have:
  ** 1 for the row being deserialized
  ** 1 for the row in the Deserialized queue
  ** nbAvailProcessor's for the row in the MergeTask executor queue (each 
mergeTask can contain up to 'InMemoryCompactionLimit' worth of data)
  ** 1 for the row being merged
  Note that if we really want to get to the (roughly) 2x like in the 
description of this ticket, we need direct hand-off for both the Deserializer 
queue *and* the merge executor. I would be fine queuing a few tasks in the 
merge executor though if that can help with throughput, but I'm not even sure 
it will.
* MergeTask calls removeDeleted and removeOldShards on the compacted cf, but it 
is also called in the constructor of PreCompactedRow a little bit later (we 
should probably remove the occurrence in PreCompactedRow as it's still 
multi-threaded while in the MergeTask). 
* In PCI.Reducer.getCompactedRow, in the case where inMemory == false, it seems 
we use the SSTI even for rows that were already read by the Deserializer, we 
should use the row instead to avoid deserializing twice.

Nitpick:
* In the CompactionIterable (and PCI), we create one 
Comparator<IColumnIterator> each time instead of having a private static final 
one (as it is the case prior to this patch). Granted, we don't create 
compaction tasks quickly enough that it would really matter much, but it seems 
like a good habit to be nice with the GC :)
* This is due to this patch, but there is a "race" when updating the bytesRead, 
such that a user could get a 0 bytesRead temporarily in the middle of a big 
compaction (and bytesRead should probably be volatile since it won't be read 
for the same thread that write it).

> Allow taking advantage of multiple cores while compacting a single CF
> ---------------------------------------------------------------------
>
>                 Key: CASSANDRA-2901
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2901
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.8.3
>
>         Attachments: 2901-v2.txt, 2901.patch
>
>
> Moved from CASSANDRA-1876:
> There are five stages: read, deserialize, merge, serialize, and write. We 
> probably want to continue doing read+deserialize and serialize+write 
> together, or you waste a lot copying to/from buffers.
> So, what I would suggest is: one thread per input sstable doing read + 
> deserialize (a row at a time). A thread pool (one per core?) merging 
> corresponding rows from each input sstable. One thread doing serialize + 
> writing the output (this has to wait for the merge threads to complete 
> in-order, obviously). This should take us from being CPU bound on SSDs (since 
> only one core is compacting) to being I/O bound.
> This will require roughly 2x the memory, to allow the reader threads to work 
> ahead of the merge stage. (I.e. for each input sstable you will have up to 
> one row in a queue waiting to be merged, and the reader thread working on the 
> next.) Seems quite reasonable on that front.  You'll also want a small queue 
> size for the serialize-merged-rows executor.
> Multithreaded compaction should be either on or off. It doesn't make sense to 
> try to do things halfway (by doing the reads with a
> threadpool whose size you can grow/shrink, for instance): we still have 
> compaction threads tuned to low priority, by default, so the impact on the 
> rest of the system won't be very different. Nor do we expect to have so many 
> input sstables that we lose a lot in context switching between reader threads.
> IMO it's acceptable to punt completely on rows that are larger than memory, 
> and fall back to the old non-parallel code there. I don't see any sane way to 
> parallelize large-row compactions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to