[
https://issues.apache.org/jira/browse/CASSANDRA-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jacek Furmankiewicz updated CASSANDRA-7361:
-------------------------------------------
Attachment: leaks_report.png
Leaks report with suspect #1
> Cassandra locks up in full GC when you assign the entire heap to row cache
> --------------------------------------------------------------------------
>
> Key: CASSANDRA-7361
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7361
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Environment: Ubuntu, RedHat, JDK 1.7
> Reporter: Jacek Furmankiewicz
> Priority: Minor
> Attachments: leaks_report.png
>
>
> We have a long running batch load process, which runs for many hours.
> Massive amount of writes, in large mutation batches (we increase the thrift
> frame size to 45 MB).
> Everything goes well, but after about 3 hrs of processing everything locks
> up. We start getting NoHostsAvailable exceptions on the Java application side
> (with Astyanax as our driver), eventually socket timeouts.
> Looking at Cassandra, we can see that it is using nearly the full 8GB of heap
> and unable to free it. It spends most of its time in full GC, but the amount
> of memory does not go down.
> Here is a long sample from jstat to show this over an extended time period
> e.g.
> http://aep.appspot.com/display/NqqEagzGRLO_pCP2q8hZtitnuVU/
> This continues even after we shut down our app. Nothing is connected to
> Cassandra any more, yet it is still stuck in full GC and cannot free up
> memory.
> Running nodetool tpstats shows that nothing is pending, all seems OK:
> {quote}
> Pool Name Active Pending Completed Blocked All
> time blocked
> ReadStage 0 0 69555935 0
> 0
> RequestResponseStage 0 0 0 0
> 0
> MutationStage 0 0 73123690 0
> 0
> ReadRepairStage 0 0 0 0
> 0
> ReplicateOnWriteStage 0 0 0 0
> 0
> GossipStage 0 0 0 0
> 0
> CacheCleanupExecutor 0 0 0 0
> 0
> MigrationStage 0 0 46 0
> 0
> MemoryMeter 0 0 1125 0
> 0
> FlushWriter 0 0 824 0
> 30
> ValidationExecutor 0 0 0 0
> 0
> InternalResponseStage 0 0 23 0
> 0
> AntiEntropyStage 0 0 0 0
> 0
> MemtablePostFlusher 0 0 1783 0
> 0
> MiscStage 0 0 0 0
> 0
> PendingRangeCalculator 0 0 1 0
> 0
> CompactionExecutor 0 0 74330 0
> 0
> commitlog_archiver 0 0 0 0
> 0
> HintedHandoff 0 0 0 0
> 0
> Message type Dropped
> RANGE_SLICE 0
> READ_REPAIR 0
> PAGED_RANGE 0
> BINARY 0
> READ 585
> MUTATION 75775
> _TRACE 0
> REQUEST_RESPONSE 0
> COUNTER_MUTATION 0
> {quote}
> We had this happen on 2 separate boxes, one with 2.0.6, the other with 2.0.8.
> Right now this is a total blocker for us. We are unable to process the
> customer data and have to abort in the middle of large processing.
> This is a new customer, so we did not have a chance to see if this occurred
> with 1.1 or 1.2 in the past (we moved to 2.0 recently).
> We have the Cassandra process still running, pls let us know if there is
> anything else we could run to give you more insight.
--
This message was sent by Atlassian JIRA
(v6.2#6252)