[jira] [Commented] (CASSANDRA-7361) Cassandra locks up in full GC when you assign the entire heap to row cache

Jacek Furmankiewicz (JIRA) Sat, 07 Jun 2014 10:36:17 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14020889#comment-14020889
 ]


Jacek Furmankiewicz commented on CASSANDRA-7361:
------------------------------------------------

OK, I understand.

Basically, the gist of the issue is that the Cassandra off-heap row cache is 
not really "off-heap" in the way most Java developers would understand it, 
especially if they had been already exposed to things like Apache DirectMemory 
or MapDB, etc.

In essence, the row cache for all intents and purposes is still limited by the 
size of the Java heap and should never exceed it (I suspect this applies to the 
combined size of the row and key cache).

So, I presume if the heap size is X, then the combined size of the row/key 
cache should never exceed let's say X / 2 (for safety).

That is what we will try and evaluate what will be the performance impact when 
we are dealing with larger data sets that cannot fit into a small 4GB row cache.

P.S. Out of curiosity, have you ever tried large heaps (e.g. 32GB / 64 GB) with 
the G1 collector? Would that approach (without JNA installed) be a viable 
experiment?

Or have you already tested a combination of large heaps and G1 and concluded it 
is not beneficial for Cassandra?

Thanks
Jacek

> Cassandra locks up in full GC when you assign the entire heap to row cache
> --------------------------------------------------------------------------
>
>                 Key: CASSANDRA-7361
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7361
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Ubuntu, RedHat, JDK 1.7
>            Reporter: Jacek Furmankiewicz
>            Priority: Minor
>         Attachments: histogram.png, leaks_report.png, top_consumers.png
>
>
> We have a long running batch load process, which runs for many hours.
> Massive amount of writes, in large mutation batches (we increase the thrift 
> frame size to 45 MB).
> Everything goes well, but after about 3 hrs of processing everything locks 
> up. We start getting NoHostsAvailable exceptions on the Java application side 
> (with Astyanax as our driver), eventually socket timeouts.
> Looking at Cassandra, we can see that it is using nearly the full 8GB of heap 
> and unable to free it. It spends most of its time in full GC, but the amount 
> of memory does not go down.
> Here is a long sample from jstat to show this over an extended time period
> e.g.
> http://aep.appspot.com/display/NqqEagzGRLO_pCP2q8hZtitnuVU/
> This continues even after we shut down our app. Nothing is connected to 
> Cassandra any more, yet it is still stuck in full GC and cannot free up 
> memory.
> Running nodetool tpstats shows that nothing is pending, all seems OK:
> {quote}
> Pool Name                    Active   Pending      Completed   Blocked  All 
> time blocked
> ReadStage                         0         0       69555935         0        
>          0
> RequestResponseStage              0         0              0         0        
>          0
> MutationStage                     0         0       73123690         0        
>          0
> ReadRepairStage                   0         0              0         0        
>          0
> ReplicateOnWriteStage             0         0              0         0        
>          0
> GossipStage                       0         0              0         0        
>          0
> CacheCleanupExecutor              0         0              0         0        
>          0
> MigrationStage                    0         0             46         0        
>          0
> MemoryMeter                       0         0           1125         0        
>          0
> FlushWriter                       0         0            824         0        
>         30
> ValidationExecutor                0         0              0         0        
>          0
> InternalResponseStage             0         0             23         0        
>          0
> AntiEntropyStage                  0         0              0         0        
>          0
> MemtablePostFlusher               0         0           1783         0        
>          0
> MiscStage                         0         0              0         0        
>          0
> PendingRangeCalculator            0         0              1         0        
>          0
> CompactionExecutor                0         0          74330         0        
>          0
> commitlog_archiver                0         0              0         0        
>          0
> HintedHandoff                     0         0              0         0        
>          0
> Message type           Dropped
> RANGE_SLICE                  0
> READ_REPAIR                  0
> PAGED_RANGE                  0
> BINARY                       0
> READ                       585
> MUTATION                 75775
> _TRACE                       0
> REQUEST_RESPONSE             0
> COUNTER_MUTATION             0
> {quote}
> We had this happen on 2 separate boxes, one with 2.0.6, the other with 2.0.8.
> Right now this is a total blocker for us. We are unable to process the 
> customer data and have to abort in the middle of large processing.
> This is a new customer, so we did not have a chance to see if this occurred 
> with 1.1 or 1.2 in the past (we moved to 2.0 recently).
> We have the Cassandra process still running, pls let us know if there is 
> anything else we could run to give you more insight.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7361) Cassandra locks up in full GC when you assign the entire heap to row cache

Reply via email to