[jira] [Comment Edited] (CASSANDRA-8295) Cassandra runs OOM @ java.util.concurrent.ConcurrentSkipListMap$HeadIndex

Jonathan Ellis (JIRA) Tue, 11 Nov 2014 20:06:12 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207647#comment-14207647
 ]


Jonathan Ellis edited comment on CASSANDRA-8295 at 11/12/14 4:05 AM:
---------------------------------------------------------------------

I had a look at the log and I see a ton of DON'T DO THIS red flags.

In order that I noticed them,

bq.  WARN [Native-Transport-Requests:4666] 2014-11-07 23:26:58,501 
BatchStatement.java (line 226) Batch of prepared statements for 
[mfgprod.test_results_new7] is of size 90086, exceeding specified threshold of 
65536 by 24550.

It looks like you're trying to use Batch as a bulk load optimization.  *Don't 
do that.*

I also see a lot of these:

bq.   INFO [ScheduledTasks:1] 2014-11-07 23:46:26,984 MessagingService.java 
(line 875) 43 MUTATION messages dropped in last 5000ms

*Don't ignore timedoutexceptions!*  It means you're overloading the server and 
need to back off.

StatusLogger says that about 10% of memtable space is being used by batchlog 
(duh) and over 40% by hints (because it needs to hint the dropped mutations).

Even with those, C* shouldn't OOM.  So I checked the yaml and saw that all the 
timeouts are set to 5x the default.  That means that C* has to keep 5x as many 
ops in memory before it can start load shedding.  *Don't do that, either.*

Finally, Cassandra is flushing as fast as it can, but with batchlog + hints 
basically tripling the write traffic, it can't keep up with a single flush 
writer.  Remember, default is flush thread per data directory.  Cassandra 
doesn't know how many spindles that represents, so it's conservative; with 
raid, *rule of thumb is 1 per disk*.  Cut that in half if raid10.  Better yet, 
use jbod and let Cassandra parallelize across the raw disks.


was (Author: jbellis):
I had a look at the log and I see a ton of DON'T DO THIS red flags.

In order that I noticed them,

bq.  WARN [Native-Transport-Requests:4666] 2014-11-07 23:26:58,501 
BatchStatement.java (line 226) Batch of prepared statements for 
[mfgprod.test_results_new7] is of size 90086, exceeding specified threshold of 
65536 by 24550.

It looks like you're trying to use Batch as a bulk load optimization.  *Don't 
do that.*

I also see a lot of these:

bq.   INFO [ScheduledTasks:1] 2014-11-07 23:46:26,984 MessagingService.java 
(line 875) 43 MUTATION messages dropped in last 5000ms

*Don't ignore timedoutexceptions!*  It means you're overloading the server and 
need to back off.

StatusLogger says that about 10% of memtable space is being used by batchlog 
(duh) and over 40% by hints (because it needs to hint the dropped mutations).

Even with those, C* shouldn't OOM.  So I checked the yaml and saw that all the 
timeouts are set to 5x the default.  That means that C* has to keep 5x as many 
ops in memory before it can start load shedding.  *Don't do that, either.*

Finally, Cassandra is flushing as fast as it can, but with batchlog + hints 
basically tripling the write traffic, it can't keep up with a single flush 
writer.  Remember, default is flush thread per data directory.  Cassandra 
doesn't know how many spindles that represents, so it's conservative; with 
raid, rule of thumb is 1 per disk.  Cut that in half if raid10.  Better yet, 
use jbod and let Cassandra parallelize across the raw disks.

> Cassandra runs OOM @ java.util.concurrent.ConcurrentSkipListMap$HeadIndex
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8295
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8295
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: DSE 4.5.3 Cassandra 2.0.11.82
>            Reporter: Jose Martinez Poblete
>         Attachments: alln01-ats-cas3.cassandra.yaml, output.tgz, system.tgz, 
> system.tgz.1, system.tgz.2, system.tgz.3
>
>
> Customer runs a 3 node cluster 
> Their dataset is less than 1Tb and during data load, one of the nodes enter a 
> GC death spiral:
> {noformat}
>  INFO [ScheduledTasks:1] 2014-11-07 23:31:08,094 GCInspector.java (line 116) 
> GC for ConcurrentMarkSweep: 3348 ms for 2 collections, 1658268944 used; max 
> is 8375238656
>  INFO [ScheduledTasks:1] 2014-11-07 23:40:58,486 GCInspector.java (line 116) 
> GC for ParNew: 442 ms for 2 collections, 6079570032 used; max is 8375238656
>  INFO [ScheduledTasks:1] 2014-11-07 23:40:58,487 GCInspector.java (line 116) 
> GC for ConcurrentMarkSweep: 7351 ms for 2 collections, 6084678280 used; max 
> is 8375238656
>  INFO [ScheduledTasks:1] 2014-11-07 23:41:01,836 GCInspector.java (line 116) 
> GC for ConcurrentMarkSweep: 603 ms for 1 collections, 7132546096 used; max is 
> 8375238656
>  INFO [ScheduledTasks:1] 2014-11-07 23:41:09,626 GCInspector.java (line 116) 
> GC for ConcurrentMarkSweep: 761 ms for 1 collections, 7286946984 used; max is 
> 8375238656
>  INFO [ScheduledTasks:1] 2014-11-07 23:41:15,265 GCInspector.java (line 116) 
> GC for ConcurrentMarkSweep: 703 ms for 1 collections, 7251213520 used; max is 
> 8375238656
>  INFO [ScheduledTasks:1] 2014-11-07 23:41:25,027 GCInspector.java (line 116) 
> GC for ConcurrentMarkSweep: 1205 ms for 1 collections, 6507586104 used; max 
> is 8375238656
>  INFO [ScheduledTasks:1] 2014-11-07 23:41:41,374 GCInspector.java (line 116) 
> GC for ConcurrentMarkSweep: 13835 ms for 3 collections, 6514187192 used; max 
> is 8375238656
>  INFO [ScheduledTasks:1] 2014-11-07 23:41:54,137 GCInspector.java (line 116) 
> GC for ConcurrentMarkSweep: 6834 ms for 2 collections, 6521656200 used; max 
> is 8375238656
> ...
>  INFO [ScheduledTasks:1] 2014-11-08 12:13:11,086 GCInspector.java (line 116) 
> GC for ConcurrentMarkSweep: 43967 ms for 2 collections, 8368777672 used; max 
> is 8375238656
>  INFO [ScheduledTasks:1] 2014-11-08 12:14:14,151 GCInspector.java (line 116) 
> GC for ConcurrentMarkSweep: 63968 ms for 3 collections, 8369623824 used; max 
> is 8375238656
>  INFO [ScheduledTasks:1] 2014-11-08 12:14:55,643 GCInspector.java (line 116) 
> GC for ConcurrentMarkSweep: 41307 ms for 2 collections, 8370115376 used; max 
> is 8375238656
>  INFO [ScheduledTasks:1] 2014-11-08 12:20:06,197 GCInspector.java (line 116) 
> GC for ConcurrentMarkSweep: 309634 ms for 15 collections, 8374994928 used; 
> max is 8375238656
>  INFO [ScheduledTasks:1] 2014-11-08 13:07:33,617 GCInspector.java (line 116) 
> GC for ConcurrentMarkSweep: 2681100 ms for 143 collections, 8347631560 used; 
> max is 8375238656
> {noformat} 
> Their application waits 1 minute before a retry when a timeout is returned
> This is what we find on their heapdumps:
> {noformat}
> Class Name                                                                    
>                                                                               
>                                                                               
>                                                                | Shallow Heap 
> | Retained Heap | Percentage
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> org.apache.cassandra.db.Memtable @ 0x773f52f80                                
>                                                                               
>                                                                               
>                                                                |           72 
> | 8,086,073,504 |     96.66%
> |- java.util.concurrent.ConcurrentSkipListMap @ 0x724508fe8                   
>                                                                               
>                                                                               
>                                                                |           48 
> | 8,086,073,320 |     96.66%
> |  |- java.util.concurrent.ConcurrentSkipListMap$HeadIndex @ 0x64f9219a0      
>                                                                               
>                                                                               
>                                                                |           32 
> | 8,086,073,256 |     96.66%
> |  |  |- java.util.concurrent.ConcurrentSkipListMap$Node @ 0x614b081a8        
>                                                                               
>                                                                               
>                                                                |           24 
> |    16,230,976 |      0.19%
> |  |  |- java.util.concurrent.ConcurrentSkipListMap$Node @ 0x7da171948        
>                                                                               
>                                                                               
>                                                                |           24 
> |     4,922,288 |      0.06%
> |  |  |- java.util.concurrent.ConcurrentSkipListMap$Node @ 0x7f4518a80        
>                                                                               
>                                                                               
>                                                                |           24 
> |     4,405,496 |      0.05%
> |  |  |- java.util.concurrent.ConcurrentSkipListMap$Node @ 0x611d69d10        
>                                                                               
>                                                                               
>                                                                |           24 
> |     3,737,672 |      0.04%
> |  |  |- java.util.concurrent.ConcurrentSkipListMap$Node @ 0x71cd2fae8        
>                                                                               
>                                                                               
>                                                                |           24 
> |     2,921,048 |      0.03%
> |  |  |- java.util.concurrent.ConcurrentSkipListMap$HeadIndex @ 0x728faed50   
>                                                                               
>                                                                               
>                                                                |           32 
> |     2,012,592 |      0.02%
> |  |  |- java.util.concurrent.ConcurrentSkipListMap$Node @ 0x6387eb950        
>                                                                               
>                                                                               
>                                                                |           24 
> |     1,641,696 |      0.02%
> |  |  |- java.util.concurrent.ConcurrentSkipListMap$Node @ 0x727f474f0        
>                                                                               
>                                                                               
>                                                                |           24 
> |     1,328,936 |      0.02%
> |  |  |- java.util.concurrent.ConcurrentSkipListMap$Node @ 0x70d7a02b0        
>                                                                               
>                                                                               
>                                                                |           24 
> |     1,050,624 |      0.01%
> |  |  |- byte[1048576] @ 0x7d87873d8  
> .........8.........CS.l`...attributes...slot..............attributes...runtime......A..<x.........C.......attributes...procgid.87.....CS.`....attributes...bflush.00.....CV......attributes...username........uV....server.f1432541.........8...server......A..<...|
>     1,048,592 |     1,048,592 |      0.01%
> |  |  |- byte[1048576] @ 0x60ab7b920  
> .....7...p...attributes...tottime....../..%....area......0.......attributes...lineid.56.....7.i.....attributes...tottime.156258924.....0B)\....container.4...../.......server....,PTXCALsdihqprod1\sdihqprod1...../.......machine.fxcdom1.....7.i.....attributes...|
>     1,048,592 |     1,048,592 |      0.01%
> |  |  |- byte[1048576] @ 0x609fb54f8  
> .....E.......attributes...lineid.901137423.....E.......attributes...testr1.1413.....E.......attributes...testr2.M393B1K70QB0-YK02014-01-03
>  
> 06:46:31.....E.......attributes...tenum1name.EFSTLOOP.....CV......attributes...numunits.1.....E.......attributes...pa...|
>     1,048,592 |     1,048,592 |      0.01%
> |  |  |- byte[1048576] @ 0x60a0b5508  
> .....E.z.....area......?.......attributes...labelnum.SYSFA.....0"U.....attributes...testr1name.D75165799...../..^....attributes...crc.Hexload_Bootloader.....E.TR....machine......0.......attributes...bflush....../..&....attributes...majline....../._.P...att...|
>     1,048,592 |     1,048,592 |      0.01%
> |  |  |- byte[1048576] @ 0x7d8f5e2b8  
> ......B9.....machine.solfr5.......L.....attributes...runtime.146.............attributes...tottime.109.......t.h...attributes...bmap.0......B9.....uuttype.VIP2-40=.......L.....attributes...cpptimeid.2006-04-11
>  10:53:48.............attributes...partnum.73-91...|    1,048,592 |     
> 1,048,592 |      0.01%
> |  |  |- byte[1048576] @ 0x7d905e2c8  
> .....E.|.x...attributes...runtime.310.....E.P.0...attributes...partnum.15-13637-02.....E./<....area.SYSFA.....E./<....passfail.S.....E.|.x...attributes...testr1.1413.....E.P.0...attributes...partnum2.15-13637-02.....E./<....container....T.....E./<....attri...|
>     1,048,592 |     1,048,592 |      0.01%
> |  |  |- byte[1048576] @ 0x7d915e2d8  
> ...../..l........../..l....server.sdihqprod1\sdihqprod1...../..l....machine.fxcdom1...../..l....uuttype.73-12304-03...../..l....area.PASTE...../..l....passfail.P...../..l....container........../..l....attributes...majline.0...../..l....attributes...subslot...|
>     1,048,592 |     1,048,592 |      0.01%
> |  |  |- byte[1048576] @ 0x7d925e2e8  
> .............attributes...tottime.42........KH...attributes...testtime.........3....uuttype.0.............attributes...runtime..............attributes...procgid.73-9341-021417817........3....area.PASTE.............attributes...test.PASSED........3....passf...|
>     1,048,592 |     1,048,592 |      0.01%
> |  |  |- byte[1048576] @ 0x7be7473f0  
> .....=..Jh.........=...(...attributes...runtime.f6f1298f-830f-47f4-b1dd-1adb07b99ff9653.....=..*(...attributes...numunits.1.....=..Jh...server......=.._....attributes...testr3name.RCDN9HQPROD1\RCDN9HQPROD1CPPVersion:3.6.2803.0.....=..*(...attributes...test...|
>     1,048,592 |     1,048,592 |      0.01%
> |  |  |- byte[1048576] @ 0x7be847800  .........0...passfail.P........ 
> (...attributes...bflush.0.......w.....attributes...tottime.1161.....A.b.(...area.SYSVF.......kH....attributes...test..............attributes...runtime.PASSED.....A.zl....attributes...procgid.2.........0...container.............|
>     1,048,592 |     1,048,592 |      0.01%
> |  |  |- byte[1048576] @ 0x7be949070  
> .....=...0...attributes...pcid......=..>....attributes...cpptimeid.6cc40f78-9525-4488-909f-2247d9537cf82013-04-04
>  
> 19:24:23.....=.Z.....attributes...runtime.0.....=...0...attributes...testr3name.CPPVersion:3.6.2803.0.....=.............=...0...attributes...p...|
>     1,048,592 |     1,048,592 |      0.01%
> |  |  |- byte[1048576] @ 0x7bea4a8e0  
> .....>{A0....uuttype.FJZPROD1\FJZPROD1.....>z.Mp...attributes...pcid......Ct..(...attributes...lineid......=..n8...container......=.oE..........4B......machine.F2049802CBLSTB-4044066-K9fxhmcekit2.....=.p(h...attributes...proctime......>z..`...machine.........|
>     1,048,592 |     1,048,592 |      0.01%
> |  |  |- byte[1048576] @ 0x7beb4a8f0  
> .....A../....attributes...partnum2......B'......attributes...username....D.....B'.L....area.f1303257.....A...P...area.74-8071-01F118190965553.....A.......server.PCBDLSYSPM.......$.....attributes...runtime......>{r+....attributes...cpptimeid......B.B.
>  ...co...|    1,048,592 |     1,048,592 |      0.01%
> |  |  |- byte[1048576] @ 0x7bec4a900  
> .....=..;x...attributes...username.tczpawe73-100074-01.....=..;x...attributes...slot.0.....=.......area.ASSY.....=..;x...attributes...lineid......=.......passfail.0P.....=.......container..........=..;x...attributes...numunits.1.....=.......attributes...pa...|
>     1,048,592 |     1,048,592 |      0.01%
> |  |  |- byte[1048576] @ 0x7bed55ff0  
> .....=oC/....machine......"..Q....uuttype......"...`...attributes...parentsernum.73-8479-02FCZ133171DPfxcestgfqa1....."`..x...attributes...test......=l.2p...attributes...tenum3.8242009070919300730FOC13283D6A....."..Q....area......"...(...attributes...bflus...|
>     1,048,592 |     1,048,592 |      0.01%
> |  |  |- byte[1048576] @ 0x61cf45088  
> .....CSaL....server.FXCPROD1\FXCPROD1.....CW......passfail.P.....CSr.....attributes...runtime.50.....CSaL....machine.foxchict217.....CW......container..........CSaL....uuttype......CW......attributes...username.73-13315-03xzhang.....CSr.....attributes...te...|
>     1,048,592 |     1,048,592 |      0.01%
> |  |  '- Total: 25 of 166,289 entries; 166,264 more                           
>                                                                               
>                                                                               
>                                                                |              
> |               |           
> |  |- java.util.concurrent.ConcurrentSkipListMap$EntrySet @ 0x72541dc58       
>                                                                               
>                                                                               
>                                                                |           16 
> |            16 |      0.00%
> |  '- Total: 2 entries                                                        
>                                                                               
>                                                                               
>                                                                |              
> |               |           
> |- org.github.jamm.MemoryMeter @ 0x72541db50                                  
>                                                                               
>                                                                               
>                                                                |           24 
> |            40 |      0.00%
> |- java.util.concurrent.atomic.AtomicLong @ 0x72541db68                       
>                                                                               
>                                                                               
>                                                                |           24 
> |            24 |      0.00%
> |- java.util.concurrent.atomic.AtomicLong @ 0x72541db80                       
>                                                                               
>                                                                               
>                                                                |           24 
> |            24 |      0.00%
> |- java.util.concurrent.atomic.AtomicLong @ 0x72541db38                       
>                                                                               
>                                                                               
>                                                                |           24 
> |            24 |      0.00%
> '- Total: 5 entries                                                           
>                                                                               
>                                                                               
>                                                                |              
> |               |           
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> {noformat}
> They are using the defaults at cassandra.yaml which means sstables should not 
> use that much heap.  Setting the following have been of no use:
> {noformat}
> memtable_total_space_in_mb: 2000
> memtable_flush_queue_size: 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-8295) Cassandra runs OOM @ java.util.concurrent.ConcurrentSkipListMap$HeadIndex

Reply via email to