[
https://issues.apache.org/jira/browse/CASSANDRA-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14020551#comment-14020551
]
Jacek Furmankiewicz commented on CASSANDRA-7361:
------------------------------------------------
It may be a daily job, depending on the customer.
Running the MAT reports, now, the early report is
{quote}
Class Name |
Objects | Shallow Heap | Retained Heap
-------------------------------------------------------------------------------------------------------------------------
|
| |
byte[] |
32,532,347 | 1,696,254,872 |
java.util.concurrent.ConcurrentHashMap$HashEntry |
32,519,002 | 1,040,608,064 |
com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node |
32,509,885 | 1,040,316,320 |
org.apache.cassandra.cache.RefCountedMemory |
32,509,885 | 1,040,316,320 |
com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$WeightedValue|
32,509,885 | 780,237,240 |
org.apache.cassandra.cache.RowCacheKey |
32,509,885 | 780,237,240 |
java.util.concurrent.atomic.AtomicInteger |
32,510,569 | 520,169,104 |
java.util.concurrent.ConcurrentHashMap$HashEntry[] |
376 | 268,516,880 |
java.nio.HeapByteBuffer |
530,765 | 25,476,720 |
edu.stanford.ppl.concurrent.CopyOnWriteManager$COWEpoch |
142,440 | 13,674,240 |
edu.stanford.ppl.concurrent.SnapTreeMap$Node |
202,121 | 9,701,808 |
edu.stanford.ppl.concurrent.SnapTreeMap$RootHolder |
201,252 | 9,660,096 |
org.apache.cassandra.db.ExpiringColumn |
194,148 | 7,765,920 |
java.util.concurrent.ConcurrentSkipListMap$Node |
206,891 | 4,965,384 |
java.lang.Long |
202,557 | 4,861,368 |
java.util.concurrent.atomic.AtomicReference |
285,233 | 4,563,728 |
edu.stanford.ppl.concurrent.CopyOnWriteManager$Latch |
142,440 | 4,558,080 |
edu.stanford.ppl.concurrent.SnapTreeMap |
142,440 | 4,558,080 |
org.apache.cassandra.db.DecoratedKey |
142,576 | 3,421,824 |
org.apache.cassandra.db.AtomicSortedColumns |
142,440 | 3,418,560 |
org.apache.cassandra.db.AtomicSortedColumns$Holder |
142,440 | 3,418,560 |
java.util.concurrent.ConcurrentSkipListMap$Index |
102,632 | 2,463,168 |
org.apache.cassandra.dht.LongToken |
143,072 | 2,289,152 |
edu.stanford.ppl.concurrent.SnapTreeMap$COWMgr |
142,440 | 2,279,040 |
char[] |
16,076 | 1,790,576 |
java.lang.Double |
67,527 | 1,620,648 |
java.util.AbstractMap$2 |
58,799 | 940,784 |
long[] |
1,604 | 594,552 |
java.lang.String |
15,872 | 380,928 |
java.lang.Object[] |
1,665 | 337,984 |
-------------------------------------------------------------------------------------------------------------------------
{quote}
> Cassandra locks up in full GC when you assign the entire heap to row cache
> --------------------------------------------------------------------------
>
> Key: CASSANDRA-7361
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7361
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Environment: Ubuntu, RedHat, JDK 1.7
> Reporter: Jacek Furmankiewicz
> Priority: Minor
>
> We have a long running batch load process, which runs for many hours.
> Massive amount of writes, in large mutation batches (we increase the thrift
> frame size to 45 MB).
> Everything goes well, but after about 3 hrs of processing everything locks
> up. We start getting NoHostsAvailable exceptions on the Java application side
> (with Astyanax as our driver), eventually socket timeouts.
> Looking at Cassandra, we can see that it is using nearly the full 8GB of heap
> and unable to free it. It spends most of its time in full GC, but the amount
> of memory does not go down.
> Here is a long sample from jstat to show this over an extended time period
> e.g.
> http://aep.appspot.com/display/NqqEagzGRLO_pCP2q8hZtitnuVU/
> This continues even after we shut down our app. Nothing is connected to
> Cassandra any more, yet it is still stuck in full GC and cannot free up
> memory.
> Running nodetool tpstats shows that nothing is pending, all seems OK:
> {quote}
> Pool Name Active Pending Completed Blocked All
> time blocked
> ReadStage 0 0 69555935 0
> 0
> RequestResponseStage 0 0 0 0
> 0
> MutationStage 0 0 73123690 0
> 0
> ReadRepairStage 0 0 0 0
> 0
> ReplicateOnWriteStage 0 0 0 0
> 0
> GossipStage 0 0 0 0
> 0
> CacheCleanupExecutor 0 0 0 0
> 0
> MigrationStage 0 0 46 0
> 0
> MemoryMeter 0 0 1125 0
> 0
> FlushWriter 0 0 824 0
> 30
> ValidationExecutor 0 0 0 0
> 0
> InternalResponseStage 0 0 23 0
> 0
> AntiEntropyStage 0 0 0 0
> 0
> MemtablePostFlusher 0 0 1783 0
> 0
> MiscStage 0 0 0 0
> 0
> PendingRangeCalculator 0 0 1 0
> 0
> CompactionExecutor 0 0 74330 0
> 0
> commitlog_archiver 0 0 0 0
> 0
> HintedHandoff 0 0 0 0
> 0
> Message type Dropped
> RANGE_SLICE 0
> READ_REPAIR 0
> PAGED_RANGE 0
> BINARY 0
> READ 585
> MUTATION 75775
> _TRACE 0
> REQUEST_RESPONSE 0
> COUNTER_MUTATION 0
> {quote}
> We had this happen on 2 separate boxes, one with 2.0.6, the other with 2.0.8.
> Right now this is a total blocker for us. We are unable to process the
> customer data and have to abort in the middle of large processing.
> This is a new customer, so we did not have a chance to see if this occurred
> with 1.1 or 1.2 in the past (we moved to 2.0 recently).
> We have the Cassandra process still running, pls let us know if there is
> anything else we could run to give you more insight.
--
This message was sent by Atlassian JIRA
(v6.2#6252)