[jira] [Commented] (CASSANDRA-7361) Cassandra locks up in full GC when you assign the entire heap to row cache

Jacek Furmankiewicz (JIRA) Fri, 06 Jun 2014 16:50:18 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14020551#comment-14020551
 ]


Jacek Furmankiewicz commented on CASSANDRA-7361:
------------------------------------------------

It may be a daily job, depending on the customer.

Running the MAT reports, now, the early report is

{quote}
Class Name                                                                  |   
 Objects |  Shallow Heap | Retained Heap
-------------------------------------------------------------------------------------------------------------------------
                                                                            |   
         |               |              
byte[]                                                                      | 
32,532,347 | 1,696,254,872 |              
java.util.concurrent.ConcurrentHashMap$HashEntry                            | 
32,519,002 | 1,040,608,064 |              
com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node         | 
32,509,885 | 1,040,316,320 |              
org.apache.cassandra.cache.RefCountedMemory                                 | 
32,509,885 | 1,040,316,320 |              
com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$WeightedValue| 
32,509,885 |   780,237,240 |              
org.apache.cassandra.cache.RowCacheKey                                      | 
32,509,885 |   780,237,240 |              
java.util.concurrent.atomic.AtomicInteger                                   | 
32,510,569 |   520,169,104 |              
java.util.concurrent.ConcurrentHashMap$HashEntry[]                          |   
     376 |   268,516,880 |              
java.nio.HeapByteBuffer                                                     |   
 530,765 |    25,476,720 |              
edu.stanford.ppl.concurrent.CopyOnWriteManager$COWEpoch                     |   
 142,440 |    13,674,240 |              
edu.stanford.ppl.concurrent.SnapTreeMap$Node                                |   
 202,121 |     9,701,808 |              
edu.stanford.ppl.concurrent.SnapTreeMap$RootHolder                          |   
 201,252 |     9,660,096 |              
org.apache.cassandra.db.ExpiringColumn                                      |   
 194,148 |     7,765,920 |              
java.util.concurrent.ConcurrentSkipListMap$Node                             |   
 206,891 |     4,965,384 |              
java.lang.Long                                                              |   
 202,557 |     4,861,368 |              
java.util.concurrent.atomic.AtomicReference                                 |   
 285,233 |     4,563,728 |              
edu.stanford.ppl.concurrent.CopyOnWriteManager$Latch                        |   
 142,440 |     4,558,080 |              
edu.stanford.ppl.concurrent.SnapTreeMap                                     |   
 142,440 |     4,558,080 |              
org.apache.cassandra.db.DecoratedKey                                        |   
 142,576 |     3,421,824 |              
org.apache.cassandra.db.AtomicSortedColumns                                 |   
 142,440 |     3,418,560 |              
org.apache.cassandra.db.AtomicSortedColumns$Holder                          |   
 142,440 |     3,418,560 |              
java.util.concurrent.ConcurrentSkipListMap$Index                            |   
 102,632 |     2,463,168 |              
org.apache.cassandra.dht.LongToken                                          |   
 143,072 |     2,289,152 |              
edu.stanford.ppl.concurrent.SnapTreeMap$COWMgr                              |   
 142,440 |     2,279,040 |              
char[]                                                                      |   
  16,076 |     1,790,576 |              
java.lang.Double                                                            |   
  67,527 |     1,620,648 |              
java.util.AbstractMap$2                                                     |   
  58,799 |       940,784 |              
long[]                                                                      |   
   1,604 |       594,552 |              
java.lang.String                                                            |   
  15,872 |       380,928 |              
java.lang.Object[]                                                          |   
   1,665 |       337,984 |              
-------------------------------------------------------------------------------------------------------------------------
{quote}

> Cassandra locks up in full GC when you assign the entire heap to row cache
> --------------------------------------------------------------------------
>
>                 Key: CASSANDRA-7361
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7361
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Ubuntu, RedHat, JDK 1.7
>            Reporter: Jacek Furmankiewicz
>            Priority: Minor
>
> We have a long running batch load process, which runs for many hours.
> Massive amount of writes, in large mutation batches (we increase the thrift 
> frame size to 45 MB).
> Everything goes well, but after about 3 hrs of processing everything locks 
> up. We start getting NoHostsAvailable exceptions on the Java application side 
> (with Astyanax as our driver), eventually socket timeouts.
> Looking at Cassandra, we can see that it is using nearly the full 8GB of heap 
> and unable to free it. It spends most of its time in full GC, but the amount 
> of memory does not go down.
> Here is a long sample from jstat to show this over an extended time period
> e.g.
> http://aep.appspot.com/display/NqqEagzGRLO_pCP2q8hZtitnuVU/
> This continues even after we shut down our app. Nothing is connected to 
> Cassandra any more, yet it is still stuck in full GC and cannot free up 
> memory.
> Running nodetool tpstats shows that nothing is pending, all seems OK:
> {quote}
> Pool Name                    Active   Pending      Completed   Blocked  All 
> time blocked
> ReadStage                         0         0       69555935         0        
>          0
> RequestResponseStage              0         0              0         0        
>          0
> MutationStage                     0         0       73123690         0        
>          0
> ReadRepairStage                   0         0              0         0        
>          0
> ReplicateOnWriteStage             0         0              0         0        
>          0
> GossipStage                       0         0              0         0        
>          0
> CacheCleanupExecutor              0         0              0         0        
>          0
> MigrationStage                    0         0             46         0        
>          0
> MemoryMeter                       0         0           1125         0        
>          0
> FlushWriter                       0         0            824         0        
>         30
> ValidationExecutor                0         0              0         0        
>          0
> InternalResponseStage             0         0             23         0        
>          0
> AntiEntropyStage                  0         0              0         0        
>          0
> MemtablePostFlusher               0         0           1783         0        
>          0
> MiscStage                         0         0              0         0        
>          0
> PendingRangeCalculator            0         0              1         0        
>          0
> CompactionExecutor                0         0          74330         0        
>          0
> commitlog_archiver                0         0              0         0        
>          0
> HintedHandoff                     0         0              0         0        
>          0
> Message type           Dropped
> RANGE_SLICE                  0
> READ_REPAIR                  0
> PAGED_RANGE                  0
> BINARY                       0
> READ                       585
> MUTATION                 75775
> _TRACE                       0
> REQUEST_RESPONSE             0
> COUNTER_MUTATION             0
> {quote}
> We had this happen on 2 separate boxes, one with 2.0.6, the other with 2.0.8.
> Right now this is a total blocker for us. We are unable to process the 
> customer data and have to abort in the middle of large processing.
> This is a new customer, so we did not have a chance to see if this occurred 
> with 1.1 or 1.2 in the past (we moved to 2.0 recently).
> We have the Cassandra process still running, pls let us know if there is 
> anything else we could run to give you more insight.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7361) Cassandra locks up in full GC when you assign the entire heap to row cache

Reply via email to