Thanks Robert, I really appreciate your help!

I'm still unsure why Cassandra 2.1 seem to perform much better in that same
scenario (even setting the same values of compaction threshold and number
of compactors), but I guess we'll revise when we'll decide to upgrade 2.1
in production.

On Dec 3, 2014 6:33 PM, "Robert Coli" <rc...@eventbrite.com> wrote:
>
> On Tue, Dec 2, 2014 at 5:01 PM, Gianluca Borello <gianl...@draios.com>
wrote:
>>
>> We mainly store time series-like data, where each data point is a binary
blob of 5-20KB. We use wide rows, and try to put in the same row all the
data that we usually need in a single query (but not more than that). As a
result, our application logic is very simple (since we have to do just one
query to read the data on average) and read/write response times are very
satisfactory. This is a cfhistograms and a cfstats of our heaviest CF:
>
>
> 100mb is not HYOOOGE but is around the size where large rows can cause
heap pressure.
>
> You seem to be unclear on the implications of pending compactions,
however.
>
> Briefly, pending compactions indicate that you have more SSTables than
you "should". As compaction both merges row versions and reduces the number
of SSTables, a high number of pending compactions causes problems
associated with both having too many row versions ("fragmentation") and a
large number of SSTables (per-SSTable heap/memory (depending on version)
overhead like bloom filters and index samples). In your case, it seems the
problem is probably just the compaction throttle being too low.
>
> My conjecture is that, given your normal data size and read/write
workload, you are relatively close to "GC pre-fail" when compaction is
working. When it stops working, you relatively quickly get into a state
where you exhaust heap because you have too many SSTables.
>
> =Rob
> http://twitter.com/rcolidba
> PS - Given 30GB of RAM on the machine, you could consider investigating
"large-heap" configurations, rbranson from Instagram has some slides out
there on the topic. What you pay is longer stop the world GCs, IOW latency
if you happen to be talking to a replica node when it pauses.
>

Reply via email to