[
https://issues.apache.org/jira/browse/CASSANDRA-14239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433463#comment-16433463
]
Jürgen Albersdorfer edited comment on CASSANDRA-14239 at 4/11/18 6:29 AM:
--------------------------------------------------------------------------
Had again to join a new node - giving it 72GB of Heap - caused again OOM.
I have a GC Log this time. For me, this smells strong like a Memory Leak.
Throw the attached [^gc.log.0.current.zip] against
[http://gceasy.io|http://gceasy.io/] and you will immediatelly see what I mean.
This Node has a fast 1TB SSD, I didn't change
# memtable_flush_writers: 2
and also left
# memtable_heap_space_in_mb: 1048# memtable_offheap_space_in_mb: 1048
defaulting to 25% of Heap.
I cannot see any IO Pressure on the System during the whole bootstrap Process:
{code:java}
-dsk/total- ---system-- ----total-cpu-usage---- --io/total-
read writ| int csw |usr sys idl wai hiq siq| read writ
200k 4458k| 23k 11k| 59 1 40 0 0 0|31.7 14.0
0 3123B|1509 214 | 6 0 94 0 0 0| 0 0.50
0 0 |2312 203 | 6 0 94 0 0 0| 0 0
0 121k|1259 198 | 6 0 94 0 0 0| 0 1.20
0 37k|1240 184 | 6 0 94 0 0 0| 0 2.20
0 0 |1240 175 | 6 0 94 0 0 0| 0 0
0 0 |1218 153 | 6 0 94 0 0 0| 0 0
0 21k|1198 141 | 6 0 94 0 0 0| 0 1.40
0 0 |1188 122 | 6 0 94 0 0 0| 0 0
0 0 |1176 121 | 6 0 94 0 0 0| 0 0
0 307B|1165 120 | 6 0 94 0 0 0| 0 0.40
0 0 |1166 116 | 6 0 94 0 0 0| 0 0
0 0 |1169 114 | 6 0 94 0 0 0| 0 0
20k 1382B| 20k 1648 | 58 0 42 0 0 0|1.50 0.50
248k 5055k| 40k 27k| 96 1 3 0 0 0|37.1 18.3
232k 2647k| 35k 29k| 98 1 1 0 0 0|33.3 7.20
894k 17M| 80k 83k| 91 4 4 0 0 2| 119 59.8
304k 19M| 35k 5311 | 95 2 2 0 0 1|40.4 56.1
342k 18M| 39k 5805 | 96 2 1 0 0 1|43.6 56.2
334k 18M| 34k 5770 | 96 2 2 0 0 0|42.5 54.2
290k 19M| 36k 6144 | 96 2 2 0 0 0|38.0 55.1
813k 23M| 42k 6870 | 94 2 3 0 0 1| 104 62.3
360k 18M| 35k 5955 | 96 2 2 0 0 0|45.8 51.4
325k 19M| 36k 6081 | 96 2 2 0 0 0|41.3 52.2
358k 18M| 36k 6036 | 95 2 3 0 0 0|45.5 50.7
344k 19M| 35k 6063 | 96 2 2 0 0 0|45.5 52.9
380k 17M| 36k 5980 | 95 2 3 0 0 0|48.7 46.0
685k 21M| 39k 6163 | 94 2 4 0 0 1|87.5 57.8
632k 18M| 34k 5885 | 95 2 3 0 0 0|63.8 53.1
795k 19M| 34k 5634 | 95 2 2 0 0 0|75.7 53.4
869k 15M| 40k 13k| 94 2 4 0 0 1|91.6 47.8
730k 16M| 54k 30k| 93 2 5 0 0 1|81.6 48.3
651k 15M| 61k 40k| 89 3 7 0 0 1|74.3 47.1
782k 15M| 78k 76k| 87 4 8 0 0 1|57.6 41.8
1284k 18M| 67k 47k| 94 3 2 0 0 1| 128 58.6
1279k 19M| 40k 5963 | 96 2 2 0 0 0| 107 56.3
1110k 18M| 38k 5986 | 96 2 2 0 0 0| 114 49.2
1286k 21M| 39k 5773 | 96 2 1 0 0 0| 109 58.0
2701k 21M| 50k 6534 | 91 2 5 0 0 1| 282 68.3
1760k 17M| 40k 5498 | 94 2 3 0 0 1| 234 48.3
1295k 18M| 42k 5610 | 95 2 3 0 0 0| 136 53.1
1315k 19M| 44k 5387 | 96 2 2 0 0 0|97.4 55.1
214k 2818k|7171 6043 | 20 0 79 0 0 0|13.8 7.80
16k 4864B|1263 200 | 6 0 94 0 0 0|0.50 0.60
0 0 |1226 166 | 6 0 94 0 0 0| 0 0
0 449k|1217 162 | 6 0 94 0 0 0| 0 1.80
0 12k|1213 155 | 6 0 94 0 0 0| 0 0.90
0 0 |1237 170 | 6 0 94 0 0 0| 0 0
239k 0 |1305 278 | 6 0 94 0 0 0|8.30 0
0 16k|1202 147 | 6 0 94 0 0 0| 0 1.30
{code}
I will try again nevertheless.
was (Author: jalbersdorfer):
Had again to join a new node - giving it 72GB of Heap - caused again OOM.
I have a GC Log this time. For me, this smells strong like a Memory Leak.
Throw the attached [^gc.log.0.current.zip] against
[http://gceasy.io|http://gceasy.io/] and you will immediatelly see what I mean.
This Node has a fast 1TB SSD, I didn't change
# memtable_flush_writers: 2
and also left
# memtable_heap_space_in_mb: 1048# memtable_offheap_space_in_mb: 1048
defaulting to 25% of Heap.
I cannot see any IO Pressure on the System during the whole bootstrap Process:
{code:java}
----system---- ---load-avg--- ---procs--- ------memory-usage----- ---paging--
-dsk/total- ---system-- ----total-cpu-usage---- --io/total- -net/total-
time | 1m 5m 15m |run blk new| used buff cach free| in out |
read writ| int csw |usr sys idl wai hiq siq| read writ| recv send
10-04 16:32:07| 118 128 133| 44 8.3 0.8|78.2G 0 15.9G 314M| 0 0 |
200k 4458k| 23k 11k| 59 1 40 0 0 0|31.7 14.0 |3198k 79k
10-04 16:32:17|99.7 124 132|1.0 0 0.9|78.2G 0 15.9G 310M| 0 0 |
0 3123B|1509 214 | 6 0 94 0 0 0| 0 0.50 | 176k 3337B
10-04 16:32:27|84.5 120 130|1.0 0 0.8|78.2G 0 15.9G 315M| 0 0 |
0 0 |2312 203 | 6 0 94 0 0 0| 0 0 | 905k 10k
10-04 16:32:37|71.7 116 129|1.0 0 0.8|78.2G 0 15.9G 316M| 0 0 |
0 121k|1259 198 | 6 0 94 0 0 0| 0 1.20 |1737B 505B
10-04 16:32:47|60.8 112 127|1.0 0 0.8|78.2G 0 15.9G 316M| 0 0 |
0 37k|1240 184 | 6 0 94 0 0 0| 0 2.20 |1450B 308B
10-04 16:32:57|51.6 109 126|1.1 0 0.8|78.2G 0 15.9G 315M| 0 0 |
0 0 |1240 175 | 6 0 94 0 0 0| 0 0 |1541B 308B
10-04 16:33:07|43.8 105 125|1.0 0 0.8|78.2G 0 15.9G 316M| 0 0 |
0 0 |1218 153 | 6 0 94 0 0 0| 0 0 |1791B 593B
10-04 16:33:17|37.2 102 123|1.0 0 0.8|78.2G 0 15.9G 316M| 0 0 |
0 21k|1198 141 | 6 0 94 0 0 0| 0 1.40 |1496B 389B
10-04 16:33:27|31.7 98.5 122|1.0 0 0.8|78.2G 0 15.9G 316M| 0 0 |
0 0 |1188 122 | 6 0 94 0 0 0| 0 0 |1610B 425B
10-04 16:33:37|27.0 95.3 121|1.0 0 0.8|78.2G 0 15.9G 316M| 0 0 |
0 0 |1176 121 | 6 0 94 0 0 0| 0 0 |1723B 313B
10-04 16:33:47|23.0 92.2 119|1.0 0 0.9|78.2G 0 15.9G 317M| 0 0 |
0 307B|1165 120 | 6 0 94 0 0 0| 0 0.40 |1515B 276B
10-04 16:33:57|19.6 89.2 118|1.1 0 0.8|78.2G 0 15.9G 317M| 0 0 |
0 0 |1166 116 | 6 0 94 0 0 0| 0 0 |1543B 384B
10-04 16:34:07|16.7 86.3 117|1.0 0 0.8|78.2G 0 15.9G 317M| 0 0 |
0 0 |1169 114 | 6 0 94 0 0 0| 0 0 |1635B 582B
10-04 16:34:17|15.3 83.7 116| 12 0 1.7|78.2G 0 15.9G 312M| 0 0 |
20k 1382B| 20k 1648 | 58 0 42 0 0 0|1.50 0.50 | 102k 7651B
10-04 16:34:27|29.9 84.5 116| 87 0 5.7|78.2G 0 15.9G 315M| 0 0 |
248k 5055k| 40k 27k| 96 1 3 0 0 0|37.1 18.3 |4296k 424k
10-04 16:34:37|47.9 86.6 116|148 0.3 0.8|78.2G 0 15.9G 309M| 0 0 |
232k 2647k| 35k 29k| 98 1 1 0 0 0|33.3 7.20 |2510k 207k
10-04 16:34:47|44.6 84.6 115| 24 0 1.3|78.2G 0 15.9G 310M| 0 0 |
894k 17M| 80k 83k| 91 4 4 0 0 2| 119 59.8 | 15M 3217k
10-04 16:34:57|41.0 82.5 114| 19 0 1.0|78.2G 0 15.9G 301M| 0 0 |
304k 19M| 35k 5311 | 95 2 2 0 0 1|40.4 56.1 | 17M 146k
10-04 16:35:07|37.9 80.5 113| 21 0 1.1|78.2G 0 15.9G 320M| 0 0 |
342k 18M| 39k 5805 | 96 2 1 0 0 1|43.6 56.2 | 20M 179k
10-04 16:35:17|35.4 78.5 112| 20 0 0.9|78.2G 0 15.9G 315M| 0 0 |
334k 18M| 34k 5770 | 96 2 2 0 0 0|42.5 54.2 | 17M 79k
10-04 16:35:27|33.3 76.7 111| 20 0 1.0|78.2G 0 15.9G 303M| 0 0 |
290k 19M| 36k 6144 | 96 2 2 0 0 0|38.0 55.1 | 19M 83k
10-04 16:35:37|31.0 74.8 110| 18 0 0.8|78.2G 0 15.9G 305M| 0 0 |
813k 23M| 42k 6870 | 94 2 3 0 0 1| 104 62.3 | 23M 90k
10-04 16:35:47|29.5 73.0 109| 21 0 0.8|78.2G 0 15.9G 323M| 0 0 |
360k 18M| 35k 5955 | 96 2 2 0 0 0|45.8 51.4 | 18M 55k
10-04 16:35:57|28.4 71.3 108| 20 0.1 0.8|78.2G 0 15.9G 313M| 0 0 |
325k 19M| 36k 6081 | 96 2 2 0 0 0|41.3 52.2 | 18M 54k
10-04 16:36:07|27.2 69.7 107| 21 0 0.8|78.2G 0 15.9G 304M| 0 0 |
358k 18M| 36k 6036 | 95 2 3 0 0 0|45.5 50.7 | 18M 56k
10-04 16:36:17|26.3 68.1 106| 21 0 0.8|78.2G 0 15.9G 305M| 0 0 |
344k 19M| 35k 6063 | 96 2 2 0 0 0|45.5 52.9 | 18M 58k
10-04 16:36:27|25.5 66.5 105| 19 0.1 0.8|78.2G 0 15.9G 301M| 0 0 |
380k 17M| 36k 5980 | 95 2 3 0 0 0|48.7 46.0 | 17M 56k
10-04 16:36:37|24.4 64.9 105| 19 0 0.8|78.2G 0 15.9G 326M| 0 0 |
685k 21M| 39k 6163 | 94 2 4 0 0 1|87.5 57.8 | 19M 58k
10-04 16:36:47|24.1 63.5 104| 21 0 1.1|78.2G 0 15.9G 315M| 0 0 |
632k 18M| 34k 5885 | 95 2 3 0 0 0|63.8 53.1 | 16M 65k
10-04 16:36:57|23.8 62.2 103| 21 0 0.9|78.2G 0 15.9G 310M| 0 0 |
795k 19M| 34k 5634 | 95 2 2 0 0 0|75.7 53.4 | 16M 64k
10-04 16:37:07|24.1 61.0 102| 23 0 1.1|78.2G 0 15.9G 317M| 0 0 |
869k 15M| 40k 13k| 94 2 4 0 0 1|91.6 47.8 | 16M 282k
10-04 16:37:17|24.6 59.9 101| 21 0 1.0|78.2G 0 15.9G 312M| 0 0 |
730k 16M| 54k 30k| 93 2 5 0 0 1|81.6 48.3 | 14M 257k
10-04 16:37:27|23.4 58.4 100| 18 0 1.2|78.3G 0 15.8G 314M| 0 0 |
651k 15M| 61k 40k| 89 3 7 0 0 1|74.3 47.1 | 17M 331k
10-04 16:37:37|24.4 57.5 99.5| 20 0 0.8|78.3G 0 15.8G 325M| 0 0 |
782k 15M| 78k 76k| 87 4 8 0 0 1|57.6 41.8 | 13M 2531k
10-04 16:37:47|24.1 56.3 98.7| 21 0 1.0|78.3G 0 15.8G 308M| 0 0
|1284k 18M| 67k 47k| 94 3 2 0 0 1| 128 58.6 | 19M 1835k
10-04 16:37:57|23.6 55.2 97.8| 21 0.1 1.0|78.3G 0 15.8G 318M| 0 0
|1279k 19M| 40k 5963 | 96 2 2 0 0 0| 107 56.3 | 17M 73k
10-04 16:38:07|23.2 54.0 97.0| 21 0 0.8|78.3G 0 15.8G 301M| 0 0
|1110k 18M| 38k 5986 | 96 2 2 0 0 0| 114 49.2 | 16M 70k
10-04 16:38:17|22.7 52.9 96.2| 20 0.1 0.8|78.3G 0 15.8G 321M| 0 0
|1286k 21M| 39k 5773 | 96 2 1 0 0 0| 109 58.0 | 17M 46k
10-04 16:38:27|22.1 51.8 95.4| 18 0 1.5|78.3G 0 15.8G 314M| 0 0
|2701k 21M| 50k 6534 | 91 2 5 0 0 1| 282 68.3 | 23M 246k
10-04 16:38:37|22.4 50.9 94.6| 20 0 0.8|78.3G 0 15.8G 324M| 0 0
|1760k 17M| 40k 5498 | 94 2 3 0 0 1| 234 48.3 | 17M 64k
10-04 16:38:47|22.2 49.9 93.8| 21 0.1 0.9|78.3G 0 15.8G 311M| 0 0
|1295k 18M| 42k 5610 | 95 2 3 0 0 0| 136 53.1 | 17M 55k
10-04 16:38:57|22.1 49.0 93.0| 21 0 0.8|78.3G 0 15.8G 316M| 0 0
|1315k 19M| 44k 5387 | 96 2 2 0 0 0|97.4 55.1 | 18M 48k
10-04 16:39:07|18.9 47.4 92.1|2.0 0 0.8|78.3G 0 15.8G 308M| 0 0 |
214k 2818k|7171 6043 | 20 0 79 0 0 0|13.8 7.80 |1691k 6620B
10-04 16:39:17|16.1 45.9 91.1|1.0 0 0.9|78.3G 0 15.8G 308M| 0 0 |
16k 4864B|1263 200 | 6 0 94 0 0 0|0.50 0.60 |1912B 547B
10-04 16:39:27|13.8 44.4 90.1|1.0 0 0.8|78.3G 0 15.8G 308M| 0 0 |
0 0 |1226 166 | 6 0 94 0 0 0| 0 0 |1721B 515B
10-04 16:39:37|11.7 42.9 89.2|1.0 0 0.8|78.3G 0 15.8G 309M| 0 0 |
0 449k|1217 162 | 6 0 94 0 0 0| 0 1.80 |1701B 398B
{code}
I will try again nevertheless.
> OutOfMemoryError when bootstrapping with less than 100GB RAM
> ------------------------------------------------------------
>
> Key: CASSANDRA-14239
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14239
> Project: Cassandra
> Issue Type: Bug
> Environment: Details of the bootstrapping Node
> * ProLiant BL460c G7
> * 56GB RAM
> * 2x 146GB 10K HDD (One dedicated for Commitlog, one for Data, Hints and
> saved_caches)
> * CentOS 7.4 on SD-Card
> * /tmp and /var/log on tmpfs
> * Oracle JDK 1.8.0_151
> * Cassandra 3.11.1
> Cluster
> * 10 existing Nodes (Up and Normal)
> Reporter: Jürgen Albersdorfer
> Priority: Major
> Attachments: Objects-by-class.csv,
> Objects-with-biggest-retained-size.csv, cassandra-env.sh, cassandra.yaml,
> gc.log.0.current.zip, jvm.options, jvm_opts.txt, stack-traces.txt
>
>
> Hi, I face an issue when bootstrapping a Node having less than 100GB RAM on
> our 10 Node C* 3.11.1 Cluster.
> During bootstrap, when I watch the cassandra.log I observe a growth in JVM
> Heap Old Gen which gets not significantly freed up any more.
> I know that JVM collects on Old Gen only when really needed. I can see
> collections, but there is always a remainder which seems to grow forever
> without ever getting freed.
> After the Node successfully Joined the Cluster, I can remove the extra RAM I
> have given it for bootstrapping without any further effect.
> It feels like Cassandra will not forget about every single byte streamed over
> the Network over time during bootstrapping, - which would be a memory leak
> and a major problem, too.
> I was able to produce a HeapDumpOnOutOfMemoryError from a 56GB Node (40 GB
> assigned JVM Heap). YourKit Profiler shows huge amount of Memory allocated
> for org.apache.cassandra.db.Memtable (22 GB)
> org.apache.cassandra.db.rows.BufferCell (19 GB) and java.nio.HeapByteBuffer
> (11 GB)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]