[
https://issues.apache.org/jira/browse/CASSANDRA-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14600078#comment-14600078
]
Constance Eustace edited comment on CASSANDRA-9640 at 6/24/15 8:52 PM:
-----------------------------------------------------------------------
attached syslog.zip....
the destabilization occurs around the end of _0040 and continues into the next
logfile
I suspect that multiple huge/wide partition keys are being resolved in
parallel, and that may be filling the heap, since a smaller set of large wide
rows (1-10 rows) doesn't seem to bother it.
I'd guess we had 20-40 rows fo 5-10 GB each when this went teetering down.
entity_etljob is the processing table that has the ultra-huge rows
was (Author: cowardlydragon):
attached syslog.zip....
the destabilization occurs around the end of _0040 and continues into the next
logfile
I suspect that multiple huge/wide partition keys are being resolved in
parallel, and that may be filling the heap, since a smaller set of large wide
rows (1-10 rows) doesn't seem to bother it.
I'd guess we had 20-30 days of rows...
entity_etljob is the processing table that has the ultra-huge rows
> Nodetool repair of very wide, large rows causes GC pressure and
> destabilization
> -------------------------------------------------------------------------------
>
> Key: CASSANDRA-9640
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9640
> Project: Cassandra
> Issue Type: Bug
> Environment: AWS, ~8GB heap
> Reporter: Constance Eustace
> Assignee: Yuki Morishita
> Priority: Minor
> Fix For: 2.1.x
>
> Attachments: syslog.zip
>
>
> We've noticed our nodes becoming unstable with large, unrecoverable Old Gen
> GCs until OOM.
> This appears to be around the time of repair, and the specific cause seems to
> be one of our report computation tables that involves possible very wide rows
> with 10GB of data in it. THis is an RF 3 table in a four-node cluster.
> We truncate this occasionally, and we also had disabled this computation
> report for a bit and noticed better node stabiliy.
> I wish I had more specifics. We are switching to an RF 1 table and do more
> proactive truncation of the table.
> When things calm down, we will attempt to replicate the issue and watch GC
> and other logs.
> Any suggestion for things to look for/enable tracing on would be welcome.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)