On Fri, Nov 26, 2010 at 10:49 AM, Peter Schuller <peter.schul...@infidyne.com> wrote: >> Making compaction parallel isn't a priority because the problem is >> almost always the opposite: how do we spread it out over a longer >> period of time instead of sharp spikes of activity that hurt >> read/write latency. I'd be very surprised if latency would be >> acceptable if you did have parallel compaction. In other words, your >> real problem is you need more capacity for your workload. > > Do you expect this to be true even with the I/O situation improved > (i.e., under conditions where the additional I/O is not a problem)? It > seems counter-intuitive to me that single-core compaction would make a > huge impact on latency when compaction is CPU bound on a 8+ core > system under moderate load (even taking into account cache > coherency/NUMA etc). > > -- > / Peter Schuller >
Carlos, I wanted to mention a specific technique I used to solve a situation I ran into. We had a large influx of data that pushed at our current hardware, as stated above the true answer was more hardware. However we ran into a situation where a single node failed several large compactions. We failed 2 or 3 big compactions we ended up with ~1000 SSTables for a column family. This turned into a chicken and egg situation where reads were slow because there were many sstables and extra data like tombstones. However the compaction was brutally slow from the read/write traffic. My solution was to create a side by side install on the same box, I used different data directories and different ports, /var/lib/cassandra/compact 9168 etc, moved the data to the new install and started it up. Then I ran nodetool compact on the new instance. This node was seeing no read or write traffic. I was surprised to see the machine was at 400%/1600% CPU used and not much io-wait. Compacting 600 GB of small SSTables took about 4 days. (However when sstables are larger I have compacted 400GB in 4 hours on the same hardware.) After which I moved the data file back in place and started the node back into the cluster. I have lived on both sides of the fence where i want long slow compactions or breakneck fast ones. I believe there is room for other compaction models. I am interested in systems that can optimize the case with multiple data directories for example. It seems like from my experiment a major compaction can not fully utilize hardware is specific conditions. Although knowing which ones to use where and how to automatically select the optimal strategy are interesting concerns.