[ 
https://issues.apache.org/jira/browse/CASSANDRA-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Martinez Poblete updated CASSANDRA-8295:
---------------------------------------------
    Comment: was deleted

(was: More info from MAT

{noformat}
Class Name      Objects Shallow Heap
java.nio.HeapByteBuffer
First 10 of 73,845,620 objects  73,845,620      3,544,589,760
edu.stanford.ppl.concurrent.SnapTreeMap$Node
First 10 of 34,614,044 objects  34,614,044      1,661,474,112
byte[]
First 10 of 3,969,475 objects   3,969,475       1,510,362,528
org.apache.cassandra.db.Column
First 10 of 34,614,043 objects  34,614,043      1,107,649,376
edu.stanford.ppl.concurrent.CopyOnWriteManager$COWEpoch
First 10 of 411,924 objects     411,924 39,544,704
java.nio.ByteBuffer[]
First 10 of 823,848 objects     823,848 30,913,568
long[]
First 10 of 411,924 objects     411,924 22,819,304
edu.stanford.ppl.concurrent.SnapTreeMap$RootHolder
First 10 of 411,924 objects     411,924 19,772,352
org.apache.cassandra.db.RangeTombstoneList
First 10 of 411,924 objects     411,924 16,476,960
int[]
First 10 of 411,924 objects     411,924 15,456,784
edu.stanford.ppl.concurrent.CopyOnWriteManager$Latch
First 10 of 411,924 objects     411,924 13,181,568
edu.stanford.ppl.concurrent.SnapTreeMap
First 10 of 411,924 objects     411,924 13,181,568
java.util.concurrent.atomic.AtomicReference
First 10 of 823,848 objects     823,848 13,181,568
java.util.concurrent.ConcurrentSkipListMap$Node
First 10 of 411,929 objects     411,929 9,886,296
org.apache.cassandra.db.DecoratedKey
First 10 of 411,928 objects     411,928 9,886,272
java.lang.Long
First 10 of 411,928 objects     411,928 9,886,272
org.apache.cassandra.db.AtomicSortedColumns
First 10 of 411,924 objects     411,924 9,886,176
org.apache.cassandra.db.AtomicSortedColumns$Holder
First 10 of 411,924 objects     411,924 9,886,176
org.apache.cassandra.db.DeletionInfo
First 10 of 411,924 objects     411,924 9,886,176
org.apache.cassandra.dht.LongToken
First 10 of 411,928 objects     411,928 6,590,848
edu.stanford.ppl.concurrent.SnapTreeMap$COWMgr
First 10 of 411,924 objects     411,924 6,590,784
java.util.concurrent.ConcurrentSkipListMap$Index
First 10 of 207,065 objects     207,065 4,969,560
java.util.concurrent.ConcurrentSkipListMap$HeadIndex
First 10 of 16 objects  16      512
org.apache.cassandra.db.DeletedColumn
All 1 objects   1       32

Total: 24 entries
155,076,837     8,086,073,256
{noformat})

> Cassandra runs OOM @ java.util.concurrent.ConcurrentSkipListMap$HeadIndex
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8295
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8295
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: DSE 4.5.3 Cassandra 2.0.11.82
>            Reporter: Jose Martinez Poblete
>         Attachments: alln01-ats-cas3.cassandra.yaml, output.tgz, system.tgz, 
> system.tgz.1, system.tgz.2, system.tgz.3
>
>
> Customer runs a 3 node cluster 
> Their dataset is less than 1Tb and during data load, one of the nodes enter a 
> GC death spiral:
> {noformat}
>  INFO [ScheduledTasks:1] 2014-11-07 23:31:08,094 GCInspector.java (line 116) 
> GC for ConcurrentMarkSweep: 3348 ms for 2 collections, 1658268944 used; max 
> is 8375238656
>  INFO [ScheduledTasks:1] 2014-11-07 23:40:58,486 GCInspector.java (line 116) 
> GC for ParNew: 442 ms for 2 collections, 6079570032 used; max is 8375238656
>  INFO [ScheduledTasks:1] 2014-11-07 23:40:58,487 GCInspector.java (line 116) 
> GC for ConcurrentMarkSweep: 7351 ms for 2 collections, 6084678280 used; max 
> is 8375238656
>  INFO [ScheduledTasks:1] 2014-11-07 23:41:01,836 GCInspector.java (line 116) 
> GC for ConcurrentMarkSweep: 603 ms for 1 collections, 7132546096 used; max is 
> 8375238656
>  INFO [ScheduledTasks:1] 2014-11-07 23:41:09,626 GCInspector.java (line 116) 
> GC for ConcurrentMarkSweep: 761 ms for 1 collections, 7286946984 used; max is 
> 8375238656
>  INFO [ScheduledTasks:1] 2014-11-07 23:41:15,265 GCInspector.java (line 116) 
> GC for ConcurrentMarkSweep: 703 ms for 1 collections, 7251213520 used; max is 
> 8375238656
>  INFO [ScheduledTasks:1] 2014-11-07 23:41:25,027 GCInspector.java (line 116) 
> GC for ConcurrentMarkSweep: 1205 ms for 1 collections, 6507586104 used; max 
> is 8375238656
>  INFO [ScheduledTasks:1] 2014-11-07 23:41:41,374 GCInspector.java (line 116) 
> GC for ConcurrentMarkSweep: 13835 ms for 3 collections, 6514187192 used; max 
> is 8375238656
>  INFO [ScheduledTasks:1] 2014-11-07 23:41:54,137 GCInspector.java (line 116) 
> GC for ConcurrentMarkSweep: 6834 ms for 2 collections, 6521656200 used; max 
> is 8375238656
> ...
>  INFO [ScheduledTasks:1] 2014-11-08 12:13:11,086 GCInspector.java (line 116) 
> GC for ConcurrentMarkSweep: 43967 ms for 2 collections, 8368777672 used; max 
> is 8375238656
>  INFO [ScheduledTasks:1] 2014-11-08 12:14:14,151 GCInspector.java (line 116) 
> GC for ConcurrentMarkSweep: 63968 ms for 3 collections, 8369623824 used; max 
> is 8375238656
>  INFO [ScheduledTasks:1] 2014-11-08 12:14:55,643 GCInspector.java (line 116) 
> GC for ConcurrentMarkSweep: 41307 ms for 2 collections, 8370115376 used; max 
> is 8375238656
>  INFO [ScheduledTasks:1] 2014-11-08 12:20:06,197 GCInspector.java (line 116) 
> GC for ConcurrentMarkSweep: 309634 ms for 15 collections, 8374994928 used; 
> max is 8375238656
>  INFO [ScheduledTasks:1] 2014-11-08 13:07:33,617 GCInspector.java (line 116) 
> GC for ConcurrentMarkSweep: 2681100 ms for 143 collections, 8347631560 used; 
> max is 8375238656
> {noformat} 
> Their application waits 1 minute before a retry when a timeout is returned
> This is what we find on their heapdumps:
> {noformat}
> Class Name                                                                    
>                                                                               
>                                                                               
>                                                                | Shallow Heap 
> | Retained Heap | Percentage
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> org.apache.cassandra.db.Memtable @ 0x773f52f80                                
>                                                                               
>                                                                               
>                                                                |           72 
> | 8,086,073,504 |     96.66%
> |- java.util.concurrent.ConcurrentSkipListMap @ 0x724508fe8                   
>                                                                               
>                                                                               
>                                                                |           48 
> | 8,086,073,320 |     96.66%
> |  |- java.util.concurrent.ConcurrentSkipListMap$HeadIndex @ 0x64f9219a0      
>                                                                               
>                                                                               
>                                                                |           32 
> | 8,086,073,256 |     96.66%
> |  |  |- java.util.concurrent.ConcurrentSkipListMap$Node @ 0x614b081a8        
>                                                                               
>                                                                               
>                                                                |           24 
> |    16,230,976 |      0.19%
> |  |  |- java.util.concurrent.ConcurrentSkipListMap$Node @ 0x7da171948        
>                                                                               
>                                                                               
>                                                                |           24 
> |     4,922,288 |      0.06%
> |  |  |- java.util.concurrent.ConcurrentSkipListMap$Node @ 0x7f4518a80        
>                                                                               
>                                                                               
>                                                                |           24 
> |     4,405,496 |      0.05%
> |  |  |- java.util.concurrent.ConcurrentSkipListMap$Node @ 0x611d69d10        
>                                                                               
>                                                                               
>                                                                |           24 
> |     3,737,672 |      0.04%
> |  |  |- java.util.concurrent.ConcurrentSkipListMap$Node @ 0x71cd2fae8        
>                                                                               
>                                                                               
>                                                                |           24 
> |     2,921,048 |      0.03%
> |  |  |- java.util.concurrent.ConcurrentSkipListMap$HeadIndex @ 0x728faed50   
>                                                                               
>                                                                               
>                                                                |           32 
> |     2,012,592 |      0.02%
> |  |  |- java.util.concurrent.ConcurrentSkipListMap$Node @ 0x6387eb950        
>                                                                               
>                                                                               
>                                                                |           24 
> |     1,641,696 |      0.02%
> |  |  |- java.util.concurrent.ConcurrentSkipListMap$Node @ 0x727f474f0        
>                                                                               
>                                                                               
>                                                                |           24 
> |     1,328,936 |      0.02%
> |  |  |- java.util.concurrent.ConcurrentSkipListMap$Node @ 0x70d7a02b0        
>                                                                               
>                                                                               
>                                                                |           24 
> |     1,050,624 |      0.01%
> |  |  |- byte[1048576] @ 0x7d87873d8  
> .........8.........CS.l`...attributes...slot..............attributes...runtime......A..<x.........C.......attributes...procgid.87.....CS.`....attributes...bflush.00.....CV......attributes...username........uV....server.f1432541.........8...server......A..<...|
>     1,048,592 |     1,048,592 |      0.01%
> |  |  |- byte[1048576] @ 0x60ab7b920  
> .....7...p...attributes...tottime....../..%....area......0.......attributes...lineid.56.....7.i.....attributes...tottime.156258924.....0B)\....container.4...../.......server....,PTXCALsdihqprod1\sdihqprod1...../.......machine.fxcdom1.....7.i.....attributes...|
>     1,048,592 |     1,048,592 |      0.01%
> |  |  |- byte[1048576] @ 0x609fb54f8  
> .....E.......attributes...lineid.901137423.....E.......attributes...testr1.1413.....E.......attributes...testr2.M393B1K70QB0-YK02014-01-03
>  
> 06:46:31.....E.......attributes...tenum1name.EFSTLOOP.....CV......attributes...numunits.1.....E.......attributes...pa...|
>     1,048,592 |     1,048,592 |      0.01%
> |  |  |- byte[1048576] @ 0x60a0b5508  
> .....E.z.....area......?.......attributes...labelnum.SYSFA.....0"U.....attributes...testr1name.D75165799...../..^....attributes...crc.Hexload_Bootloader.....E.TR....machine......0.......attributes...bflush....../..&....attributes...majline....../._.P...att...|
>     1,048,592 |     1,048,592 |      0.01%
> |  |  |- byte[1048576] @ 0x7d8f5e2b8  
> ......B9.....machine.solfr5.......L.....attributes...runtime.146.............attributes...tottime.109.......t.h...attributes...bmap.0......B9.....uuttype.VIP2-40=.......L.....attributes...cpptimeid.2006-04-11
>  10:53:48.............attributes...partnum.73-91...|    1,048,592 |     
> 1,048,592 |      0.01%
> |  |  |- byte[1048576] @ 0x7d905e2c8  
> .....E.|.x...attributes...runtime.310.....E.P.0...attributes...partnum.15-13637-02.....E./<....area.SYSFA.....E./<....passfail.S.....E.|.x...attributes...testr1.1413.....E.P.0...attributes...partnum2.15-13637-02.....E./<....container....T.....E./<....attri...|
>     1,048,592 |     1,048,592 |      0.01%
> |  |  |- byte[1048576] @ 0x7d915e2d8  
> ...../..l........../..l....server.sdihqprod1\sdihqprod1...../..l....machine.fxcdom1...../..l....uuttype.73-12304-03...../..l....area.PASTE...../..l....passfail.P...../..l....container........../..l....attributes...majline.0...../..l....attributes...subslot...|
>     1,048,592 |     1,048,592 |      0.01%
> |  |  |- byte[1048576] @ 0x7d925e2e8  
> .............attributes...tottime.42........KH...attributes...testtime.........3....uuttype.0.............attributes...runtime..............attributes...procgid.73-9341-021417817........3....area.PASTE.............attributes...test.PASSED........3....passf...|
>     1,048,592 |     1,048,592 |      0.01%
> |  |  |- byte[1048576] @ 0x7be7473f0  
> .....=..Jh.........=...(...attributes...runtime.f6f1298f-830f-47f4-b1dd-1adb07b99ff9653.....=..*(...attributes...numunits.1.....=..Jh...server......=.._....attributes...testr3name.RCDN9HQPROD1\RCDN9HQPROD1CPPVersion:3.6.2803.0.....=..*(...attributes...test...|
>     1,048,592 |     1,048,592 |      0.01%
> |  |  |- byte[1048576] @ 0x7be847800  .........0...passfail.P........ 
> (...attributes...bflush.0.......w.....attributes...tottime.1161.....A.b.(...area.SYSVF.......kH....attributes...test..............attributes...runtime.PASSED.....A.zl....attributes...procgid.2.........0...container.............|
>     1,048,592 |     1,048,592 |      0.01%
> |  |  |- byte[1048576] @ 0x7be949070  
> .....=...0...attributes...pcid......=..>....attributes...cpptimeid.6cc40f78-9525-4488-909f-2247d9537cf82013-04-04
>  
> 19:24:23.....=.Z.....attributes...runtime.0.....=...0...attributes...testr3name.CPPVersion:3.6.2803.0.....=.............=...0...attributes...p...|
>     1,048,592 |     1,048,592 |      0.01%
> |  |  |- byte[1048576] @ 0x7bea4a8e0  
> .....>{A0....uuttype.FJZPROD1\FJZPROD1.....>z.Mp...attributes...pcid......Ct..(...attributes...lineid......=..n8...container......=.oE..........4B......machine.F2049802CBLSTB-4044066-K9fxhmcekit2.....=.p(h...attributes...proctime......>z..`...machine.........|
>     1,048,592 |     1,048,592 |      0.01%
> |  |  |- byte[1048576] @ 0x7beb4a8f0  
> .....A../....attributes...partnum2......B'......attributes...username....D.....B'.L....area.f1303257.....A...P...area.74-8071-01F118190965553.....A.......server.PCBDLSYSPM.......$.....attributes...runtime......>{r+....attributes...cpptimeid......B.B.
>  ...co...|    1,048,592 |     1,048,592 |      0.01%
> |  |  |- byte[1048576] @ 0x7bec4a900  
> .....=..;x...attributes...username.tczpawe73-100074-01.....=..;x...attributes...slot.0.....=.......area.ASSY.....=..;x...attributes...lineid......=.......passfail.0P.....=.......container..........=..;x...attributes...numunits.1.....=.......attributes...pa...|
>     1,048,592 |     1,048,592 |      0.01%
> |  |  |- byte[1048576] @ 0x7bed55ff0  
> .....=oC/....machine......"..Q....uuttype......"...`...attributes...parentsernum.73-8479-02FCZ133171DPfxcestgfqa1....."`..x...attributes...test......=l.2p...attributes...tenum3.8242009070919300730FOC13283D6A....."..Q....area......"...(...attributes...bflus...|
>     1,048,592 |     1,048,592 |      0.01%
> |  |  |- byte[1048576] @ 0x61cf45088  
> .....CSaL....server.FXCPROD1\FXCPROD1.....CW......passfail.P.....CSr.....attributes...runtime.50.....CSaL....machine.foxchict217.....CW......container..........CSaL....uuttype......CW......attributes...username.73-13315-03xzhang.....CSr.....attributes...te...|
>     1,048,592 |     1,048,592 |      0.01%
> |  |  '- Total: 25 of 166,289 entries; 166,264 more                           
>                                                                               
>                                                                               
>                                                                |              
> |               |           
> |  |- java.util.concurrent.ConcurrentSkipListMap$EntrySet @ 0x72541dc58       
>                                                                               
>                                                                               
>                                                                |           16 
> |            16 |      0.00%
> |  '- Total: 2 entries                                                        
>                                                                               
>                                                                               
>                                                                |              
> |               |           
> |- org.github.jamm.MemoryMeter @ 0x72541db50                                  
>                                                                               
>                                                                               
>                                                                |           24 
> |            40 |      0.00%
> |- java.util.concurrent.atomic.AtomicLong @ 0x72541db68                       
>                                                                               
>                                                                               
>                                                                |           24 
> |            24 |      0.00%
> |- java.util.concurrent.atomic.AtomicLong @ 0x72541db80                       
>                                                                               
>                                                                               
>                                                                |           24 
> |            24 |      0.00%
> |- java.util.concurrent.atomic.AtomicLong @ 0x72541db38                       
>                                                                               
>                                                                               
>                                                                |           24 
> |            24 |      0.00%
> '- Total: 5 entries                                                           
>                                                                               
>                                                                               
>                                                                |              
> |               |           
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> {noformat}
> They are using the defaults at cassandra.yaml which means sstables should not 
> use that much heap.  Setting the following have been of no use:
> {noformat}
> memtable_total_space_in_mb: 2000
> memtable_flush_queue_size: 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to