[ https://issues.apache.org/jira/browse/CASSANDRA-14239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433463#comment-16433463 ]
Jürgen Albersdorfer edited comment on CASSANDRA-14239 at 4/11/18 6:30 AM: -------------------------------------------------------------------------- Had again to join a new node - giving it 72GB of Heap - caused again OOM. I have a GC Log this time. For me, this smells strong like a Memory Leak. Throw the attached [^gc.log.0.current.zip] against [http://gceasy.io|http://gceasy.io/] and you will immediatelly see what I mean. This Node has a fast 1TB SSD, I didn't change # memtable_flush_writers: 2 and also left # memtable_heap_space_in_mb: 1048# memtable_offheap_space_in_mb: 1048 defaulting to 25% of Heap. I cannot see any IO Pressure on the System during the whole bootstrap Process: {code:java} -dsk/total- ---system-- ----total-cpu-usage---- --io/total- read writ| int csw |usr sys idl wai hiq siq| read writ 200k 4458k| 23k 11k| 59 1 40 0 0 0|31.7 14.0 0 3123B|1509 214 | 6 0 94 0 0 0| 0 0.50 0 0 |2312 203 | 6 0 94 0 0 0| 0 0 0 121k|1259 198 | 6 0 94 0 0 0| 0 1.20 0 37k|1240 184 | 6 0 94 0 0 0| 0 2.20 0 0 |1240 175 | 6 0 94 0 0 0| 0 0 0 0 |1218 153 | 6 0 94 0 0 0| 0 0 0 21k|1198 141 | 6 0 94 0 0 0| 0 1.40 0 0 |1188 122 | 6 0 94 0 0 0| 0 0 0 0 |1176 121 | 6 0 94 0 0 0| 0 0 0 307B|1165 120 | 6 0 94 0 0 0| 0 0.40 0 0 |1166 116 | 6 0 94 0 0 0| 0 0 0 0 |1169 114 | 6 0 94 0 0 0| 0 0 20k 1382B| 20k 1648 | 58 0 42 0 0 0|1.50 0.50 248k 5055k| 40k 27k| 96 1 3 0 0 0|37.1 18.3 232k 2647k| 35k 29k| 98 1 1 0 0 0|33.3 7.20 894k 17M| 80k 83k| 91 4 4 0 0 2| 119 59.8 304k 19M| 35k 5311 | 95 2 2 0 0 1|40.4 56.1 342k 18M| 39k 5805 | 96 2 1 0 0 1|43.6 56.2 334k 18M| 34k 5770 | 96 2 2 0 0 0|42.5 54.2 290k 19M| 36k 6144 | 96 2 2 0 0 0|38.0 55.1 813k 23M| 42k 6870 | 94 2 3 0 0 1| 104 62.3 360k 18M| 35k 5955 | 96 2 2 0 0 0|45.8 51.4 325k 19M| 36k 6081 | 96 2 2 0 0 0|41.3 52.2 358k 18M| 36k 6036 | 95 2 3 0 0 0|45.5 50.7 344k 19M| 35k 6063 | 96 2 2 0 0 0|45.5 52.9 380k 17M| 36k 5980 | 95 2 3 0 0 0|48.7 46.0 685k 21M| 39k 6163 | 94 2 4 0 0 1|87.5 57.8 632k 18M| 34k 5885 | 95 2 3 0 0 0|63.8 53.1 795k 19M| 34k 5634 | 95 2 2 0 0 0|75.7 53.4 869k 15M| 40k 13k| 94 2 4 0 0 1|91.6 47.8 730k 16M| 54k 30k| 93 2 5 0 0 1|81.6 48.3 651k 15M| 61k 40k| 89 3 7 0 0 1|74.3 47.1 782k 15M| 78k 76k| 87 4 8 0 0 1|57.6 41.8 1284k 18M| 67k 47k| 94 3 2 0 0 1| 128 58.6 1279k 19M| 40k 5963 | 96 2 2 0 0 0| 107 56.3 1110k 18M| 38k 5986 | 96 2 2 0 0 0| 114 49.2 1286k 21M| 39k 5773 | 96 2 1 0 0 0| 109 58.0 2701k 21M| 50k 6534 | 91 2 5 0 0 1| 282 68.3 1760k 17M| 40k 5498 | 94 2 3 0 0 1| 234 48.3 1295k 18M| 42k 5610 | 95 2 3 0 0 0| 136 53.1 1315k 19M| 44k 5387 | 96 2 2 0 0 0|97.4 55.1 214k 2818k|7171 6043 | 20 0 79 0 0 0|13.8 7.80 16k 4864B|1263 200 | 6 0 94 0 0 0|0.50 0.60 0 0 |1226 166 | 6 0 94 0 0 0| 0 0 0 449k|1217 162 | 6 0 94 0 0 0| 0 1.80 0 12k|1213 155 | 6 0 94 0 0 0| 0 0.90 0 0 |1237 170 | 6 0 94 0 0 0| 0 0 239k 0 |1305 278 | 6 0 94 0 0 0|8.30 0 0 16k|1202 147 | 6 0 94 0 0 0| 0 1.30 {code} I will try again with changed settings nevertheless. was (Author: jalbersdorfer): Had again to join a new node - giving it 72GB of Heap - caused again OOM. I have a GC Log this time. For me, this smells strong like a Memory Leak. Throw the attached [^gc.log.0.current.zip] against [http://gceasy.io|http://gceasy.io/] and you will immediatelly see what I mean. This Node has a fast 1TB SSD, I didn't change # memtable_flush_writers: 2 and also left # memtable_heap_space_in_mb: 1048# memtable_offheap_space_in_mb: 1048 defaulting to 25% of Heap. I cannot see any IO Pressure on the System during the whole bootstrap Process: {code:java} -dsk/total- ---system-- ----total-cpu-usage---- --io/total- read writ| int csw |usr sys idl wai hiq siq| read writ 200k 4458k| 23k 11k| 59 1 40 0 0 0|31.7 14.0 0 3123B|1509 214 | 6 0 94 0 0 0| 0 0.50 0 0 |2312 203 | 6 0 94 0 0 0| 0 0 0 121k|1259 198 | 6 0 94 0 0 0| 0 1.20 0 37k|1240 184 | 6 0 94 0 0 0| 0 2.20 0 0 |1240 175 | 6 0 94 0 0 0| 0 0 0 0 |1218 153 | 6 0 94 0 0 0| 0 0 0 21k|1198 141 | 6 0 94 0 0 0| 0 1.40 0 0 |1188 122 | 6 0 94 0 0 0| 0 0 0 0 |1176 121 | 6 0 94 0 0 0| 0 0 0 307B|1165 120 | 6 0 94 0 0 0| 0 0.40 0 0 |1166 116 | 6 0 94 0 0 0| 0 0 0 0 |1169 114 | 6 0 94 0 0 0| 0 0 20k 1382B| 20k 1648 | 58 0 42 0 0 0|1.50 0.50 248k 5055k| 40k 27k| 96 1 3 0 0 0|37.1 18.3 232k 2647k| 35k 29k| 98 1 1 0 0 0|33.3 7.20 894k 17M| 80k 83k| 91 4 4 0 0 2| 119 59.8 304k 19M| 35k 5311 | 95 2 2 0 0 1|40.4 56.1 342k 18M| 39k 5805 | 96 2 1 0 0 1|43.6 56.2 334k 18M| 34k 5770 | 96 2 2 0 0 0|42.5 54.2 290k 19M| 36k 6144 | 96 2 2 0 0 0|38.0 55.1 813k 23M| 42k 6870 | 94 2 3 0 0 1| 104 62.3 360k 18M| 35k 5955 | 96 2 2 0 0 0|45.8 51.4 325k 19M| 36k 6081 | 96 2 2 0 0 0|41.3 52.2 358k 18M| 36k 6036 | 95 2 3 0 0 0|45.5 50.7 344k 19M| 35k 6063 | 96 2 2 0 0 0|45.5 52.9 380k 17M| 36k 5980 | 95 2 3 0 0 0|48.7 46.0 685k 21M| 39k 6163 | 94 2 4 0 0 1|87.5 57.8 632k 18M| 34k 5885 | 95 2 3 0 0 0|63.8 53.1 795k 19M| 34k 5634 | 95 2 2 0 0 0|75.7 53.4 869k 15M| 40k 13k| 94 2 4 0 0 1|91.6 47.8 730k 16M| 54k 30k| 93 2 5 0 0 1|81.6 48.3 651k 15M| 61k 40k| 89 3 7 0 0 1|74.3 47.1 782k 15M| 78k 76k| 87 4 8 0 0 1|57.6 41.8 1284k 18M| 67k 47k| 94 3 2 0 0 1| 128 58.6 1279k 19M| 40k 5963 | 96 2 2 0 0 0| 107 56.3 1110k 18M| 38k 5986 | 96 2 2 0 0 0| 114 49.2 1286k 21M| 39k 5773 | 96 2 1 0 0 0| 109 58.0 2701k 21M| 50k 6534 | 91 2 5 0 0 1| 282 68.3 1760k 17M| 40k 5498 | 94 2 3 0 0 1| 234 48.3 1295k 18M| 42k 5610 | 95 2 3 0 0 0| 136 53.1 1315k 19M| 44k 5387 | 96 2 2 0 0 0|97.4 55.1 214k 2818k|7171 6043 | 20 0 79 0 0 0|13.8 7.80 16k 4864B|1263 200 | 6 0 94 0 0 0|0.50 0.60 0 0 |1226 166 | 6 0 94 0 0 0| 0 0 0 449k|1217 162 | 6 0 94 0 0 0| 0 1.80 0 12k|1213 155 | 6 0 94 0 0 0| 0 0.90 0 0 |1237 170 | 6 0 94 0 0 0| 0 0 239k 0 |1305 278 | 6 0 94 0 0 0|8.30 0 0 16k|1202 147 | 6 0 94 0 0 0| 0 1.30 {code} I will try again nevertheless. > OutOfMemoryError when bootstrapping with less than 100GB RAM > ------------------------------------------------------------ > > Key: CASSANDRA-14239 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14239 > Project: Cassandra > Issue Type: Bug > Environment: Details of the bootstrapping Node > * ProLiant BL460c G7 > * 56GB RAM > * 2x 146GB 10K HDD (One dedicated for Commitlog, one for Data, Hints and > saved_caches) > * CentOS 7.4 on SD-Card > * /tmp and /var/log on tmpfs > * Oracle JDK 1.8.0_151 > * Cassandra 3.11.1 > Cluster > * 10 existing Nodes (Up and Normal) > Reporter: Jürgen Albersdorfer > Priority: Major > Attachments: Objects-by-class.csv, > Objects-with-biggest-retained-size.csv, cassandra-env.sh, cassandra.yaml, > gc.log.0.current.zip, jvm.options, jvm_opts.txt, stack-traces.txt > > > Hi, I face an issue when bootstrapping a Node having less than 100GB RAM on > our 10 Node C* 3.11.1 Cluster. > During bootstrap, when I watch the cassandra.log I observe a growth in JVM > Heap Old Gen which gets not significantly freed up any more. > I know that JVM collects on Old Gen only when really needed. I can see > collections, but there is always a remainder which seems to grow forever > without ever getting freed. > After the Node successfully Joined the Cluster, I can remove the extra RAM I > have given it for bootstrapping without any further effect. > It feels like Cassandra will not forget about every single byte streamed over > the Network over time during bootstrapping, - which would be a memory leak > and a major problem, too. > I was able to produce a HeapDumpOnOutOfMemoryError from a 56GB Node (40 GB > assigned JVM Heap). YourKit Profiler shows huge amount of Memory allocated > for org.apache.cassandra.db.Memtable (22 GB) > org.apache.cassandra.db.rows.BufferCell (19 GB) and java.nio.HeapByteBuffer > (11 GB) -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org