[GitHub] [incubator-streampark] 13301891422 opened a new issue, #2995: [Bug] Bug title

via GitHub Wed, 30 Aug 2023 00:13:37 -0700


13301891422 opened a new issue, #2995:
URL: https://github.com/apache/incubator-streampark/issues/2995


   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/incubator-streampark/issues?q=is%3Aissue+label%3A%22bug%22)
 and found no similar issues.
   
   
   ### Java Version
   
   1.8.0_212
   
   ### Scala Version
   
   2.12.x
   
   ### StreamPark Version
   
   2.0.0
   
   ### Flink Version
   
   1.15.4
   
   ### deploy mode
   
   yarn-application
   
   ### What happened
   
   When I submit a Flink On Yarn (Yarn application mode) task using StreamPark, 
the JobManager's parameters look like this:
   
   jobmanager.memory.heap.size  469762048b
   jobmanager.memory.jvm-metaspace.size 268435456b
   jobmanager.memory.jvm-overhead.max   201326592b
   jobmanager.memory.jvm-overhead.min   201326592b
   jobmanager.memory.off-heap.size      134217728b
   jobmanager.memory.process.size       1024mb
   
   After the task runs for a period of time (about 3 to 20 days), the Container 
running the JobManager will always be killed by ResourceManager. Then I start 
the GC log of the JobManager. The process that discovered JobManager performs a 
next-generation GC about every 2 minutes or so, as follows:
   
   2023-08-30T13:56:57.694+0800: [GC (Allocation Failure)] [PSYoungGen: 
149956K->1673K(150528K)] 315127K->166876K(456704K), 0.0138514 secs] [Times: 
user=0.54 sys=0.05, real=0.02 secs]
   2023-08-30T13:59:17.558+0800: [GC (Allocation Failure)] [PSYoungGen: 
150141K->1636K(150528K)] 315344K->166871K(456704K), 0.0285263 secs] [Times: 
user= 1.20sys =0.11, real=0.03 secs]
   ...
   2023-08-30T14:47:54.412+0800: [GC (Allocation Failure)] [PSYoungGen: 
148425K->1700K(150016K)] 314796K->168135K(456192K), 0.0258613 secs] [Times: 
user= 0.96sys =0.06, real=0.03 secs]
   2023-08-30T14:50:12.434+0800: [GC (Allocation Failure)] [PSYoungGen: 
149138K->1156K(150016K)] 315573K->167607K(456192K), 0.0233593 secs] [Times: 
user=0.77 sys=0.07, real=0.03 secs]
   
   In order to understand the cause of JobManager's frequent GC, I dump the 
objects in JobManager's java heap into local files, and then use VisualVM to 
open them for analysis, and find that Char[] occupies the largest memory space, 
as shown in the following figure:
   
   <img width="1456" alt="Snipaste_Heap_Dump" 
src="https://github.com/apache/incubator-streampark/assets/57441331/35a2eb59-c7fb-4929-b32e-8cb35c1c729b";>
   
   <img width="1464" alt="Snipaste_Heap_Dump_2" 
src="https://github.com/apache/incubator-streampark/assets/57441331/079d45ad-d99f-4ff4-8a35-210b5b8d3f31";>
   
   <img width="1464" alt="image" 
src="https://github.com/apache/incubator-streampark/assets/57441331/2c625ba3-2be0-4777-89f1-fb959f1e413a";>
   
   What are the reasons for this? If we use FLINK_HOME/bin/flink run t 
yarn-per-job to submit the task from the command line, we will not generate so 
many Char[]. The GC time of JobManager (this program's parameters are exactly 
the same as the above parameters) is about once every 40 minutes. This 
situation seems to be relatively normal
   
   As for the reason why containers are frequently killed, we will set 
jobmanager.memory.enable-jvm-direct-memory-limit = true to avoid memory 
overlimit. Do we know whether this parameter is useful for memory overlimit 
killing?
   
   
   ### Error Exception
   
   ```log
   Failing this attempt.Diagnostics: [2023-08-22 08:49:10.443]Container 
[pid=77475,container/D=container_e08_1683881703260_1165_01 0000011 running 
9510912B beyond the
   PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB physical memory used; 
3.2 GB of 2.1 GB virtual memory used. Killing container
   ```
   
   
   ### Screenshots
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!(您是否要贡献这个PR?)
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [incubator-streampark] 13301891422 opened a new issue, #2995: [Bug] Bug title

Reply via email to