qiaozhanwei opened a new issue #3370: URL: https://github.com/apache/incubator-dolphinscheduler/issues/3370
**Describe the feature** JVM parameter optimization **Is your feature request related to a problem? Please describe.** 4G heap memory and default young generation size within 30s of starting, there will be one Minor GC and two Full GCs S0C S1C S0U S1U EC EU OC OU MC MU CCSC CCSU YGC YGCT FGC FGCT GCT 110720.0 110720.0 0.0 48623.7 886080.0 607260.9 3086784.0 0.0 42276.0 41076.2 5676.0 5418.5 1 0.066 2 0.305 0.370 **Describe the solution you'd like** A clear and concise description of what you want to happen. reasoning: If the old age is full, causing Full GC, then it is unlikely that the old age will be zero after Full GC, which is relatively small. Then there is the metadata area. Visually, MU uses a lot, so adjust the size of the metadata area. The metadata area mainly stores the description information and static variables of some classes. If there are more reflections, this area can be increased appropriately It is recommended to 256MB or 512M. Considering the user's machine memory, set 128M temporarily, which is smaller -XX:MetaspaceSize=128m -XX:MaxMetaspaceSize=128m After the optimization is started, there is no Full GC S0C S1C S0U S1U EC EU OC OU MC MU CCSC CCSU YGC YGCT FGC FGCT GCT 209664.0 209664.0 0.0 51195.2 1677824.0 1584070.1 2097152.0 0.0 43136.0 41916.0 5760.0 5425.1 1 0.071 0 0.000 0.071 Worker starts 20 processes, Linux memory takes up 1.530g, an average of about 78MB each, one process PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7782 root 20 0 13.638g 1.530g 23908 S 0.0 1.2 0:56.66 java Check the JVM memory at this time, and there is no change Example: -------------------------------------------------- ------------------------------------------ Start a process every 5s, 20 tasks, each task sleep 10s t_ds_task_instance 40 tasks are running -------------------------------------------------- ------------------------------------------ 4G heap memory export DOLPHINSCHEDULER_OPTS="-server -Xms4g -Xmx4g -Xmn2g -XX:MetaspaceSize=128m -XX:MaxMetaspaceSize=128m -Xss512k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:Bytes=128Enabled -XX:LargePageSize +UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails -Xloggc:gc.log -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/oom" The new generation is about 2M per second Perform minor gc once every 14 minutes, survivor 40~50M, enter the old age without dynamic age judgment, without full gc -------------------------------------------------- ------------------------------------------ 2G heap memory export DOLPHINSCHEDULER_OPTS="-server -Xms2g -Xmx2g -Xmn1g -XX:MetaspaceSize=128m -XX:MaxMetaspaceSize=128m -Xss512k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:Bytes=128Enabled -XX:LargePageSize +UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails -Xloggc:gc.log -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/oom" The new generation is about 2M per second Perform minor gc once every 7 minutes, survivor 40~50M, enter the old age without dynamic age judgment, without full gc For the JDBC result set, there are 1000. According to the test of the task instance table, there are an average of 4 task instance result sets, each time in 30s. Each time young gc old age increases by 1M The new generation is about 30M per second Master JVM optimization -------------------------------------------------- ------------------------------------------ Start a process every 5s, 20 tasks, each task sleep 10s t_ds_task_instance 40 tasks are running 3~4 process examples -------------------------------------------------- ------------------------------------------ The new generation is about 40M per second Perform a minor gc once in 40s, and the old age will increase by about 1M after each minor gc Start a process every 1s, 20 tasks, each task sleep 10s t_ds_process_instance 80 process instances The new generation has been around 500M per second Perform a minor gc once in 3s, and the old generation will increase by about 1M after each minor gc Api Server -------------------------------------------------- ------------------------------------------ export DOLPHINSCHEDULER_OPTS="-server -Xms1g -Xmx1g -Xmn500m -XX:MetaspaceSize=128m -XX:MaxMetaspaceSize=128m -Xss512k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:LargePageSize -XX:LargePageSize +UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails -Xloggc:gc.log -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/oom" Minor gc occurs once in about 30 minutes worker JVM优化 export DOLPHINSCHEDULER_OPTS="-server -Xms4g -Xmx4g -Xss512k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:LargePageSizeInBytes=128m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails -Xloggc:gc.log -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=dump.hprof" 4g s0 108MB s1 108MB eden 865MB old 3014MB metaspace 41MB 启动完30s内,就会发生一次Minor GC和两次Full GC S0C S1C S0U S1U EC EU OC OU MC MU CCSC CCSU YGC YGCT FGC FGCT GCT 110720.0 110720.0 0.0 48623.7 886080.0 607260.9 3086784.0 0.0 42276.0 41076.2 5676.0 5418.5 1 0.066 2 0.305 0.370 推理: 如果要是老年代满了,引起Full GC,那老年代也不太可能Full GC之后,OU为零,可能性比较小 然后就是元数据区了,目测MU使用很多,所以调整一下元数据区域大小, 元数据区域主要存放的是一些类的描述信息和静态变量,如果反射比较多,这个区域可以适当的增大 建议256MB或者512M,考虑到用户机器内存,姑且设置128M,小一点 -XX:MetaspaceSize=128m -XX:MaxMetaspaceSize=128m 优化后启动之后,无Full GC S0C S1C S0U S1U EC EU OC OU MC MU CCSC CCSU YGC YGCT FGC FGCT GCT 209664.0 209664.0 0.0 51195.2 1677824.0 1584070.1 2097152.0 0.0 43136.0 41916.0 5760.0 5425.1 1 0.071 0 0.000 0.071 worker 启动20个进程,linux内存占用1.530g,平均每个78MB左右,一个进程 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7782 root 20 0 13.638g 1.530g 23908 S 0.0 1.2 0:56.66 java 此时查看JVM内存,并没有变化 示例: -------------------------------------------------------------------------------------------- 每5s启动一个流程,20个任务,每个任务sleep 10s t_ds_task_instance 40个任务在运行 -------------------------------------------------------------------------------------------- 4G堆内存 export DOLPHINSCHEDULER_OPTS="-server -Xms4g -Xmx4g -Xmn2g -XX:MetaspaceSize=128m -XX:MaxMetaspaceSize=128m -Xss512k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:LargePageSizeInBytes=128m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails -Xloggc:gc.log -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/oom" 每秒新生代曾2M左右 14分钟进行一次minor gc,survivor 40~50M,无动态年龄判断进入老年代,没有full gc -------------------------------------------------------------------------------------------- 2G堆内存 export DOLPHINSCHEDULER_OPTS="-server -Xms2g -Xmx2g -Xmn1g -XX:MetaspaceSize=128m -XX:MaxMetaspaceSize=128m -Xss512k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:LargePageSizeInBytes=128m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails -Xloggc:gc.log -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/oom" 每秒新生代曾2M左右 7分钟进行一次minor gc,survivor 40~50M,无动态年龄判断进入老年代,没有full gc 对于JDBC结果集合,有1000条,根据任务实例表测试的,平均有4个任务实例结果集,30s一次yong gc 每次young gc老年代增加1M 新生代30M左右每秒 Master JVM优化 -------------------------------------------------------------------------------------------- 每5s启动一个流程,20个任务,每个任务sleep 10s t_ds_task_instance 40个任务在运行 3~4个流程实例 -------------------------------------------------------------------------------------------- 每秒新生代曾40M左右 40s进行一次minor gc,每次minor gc之后老年代增加1M左右 每1s启动一个流程,20个任务,每个任务sleep 10s t_ds_process_instance 80个流程实例 每秒新生代曾500M左右 3s进行一次minor gc,每次minor gc之后老年代增加1M左右 Api Server -------------------------------------------------------------------------------------------- export DOLPHINSCHEDULER_OPTS="-server -Xms1g -Xmx1g -Xmn500m -XX:MetaspaceSize=128m -XX:MaxMetaspaceSize=128m -Xss512k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:LargePageSizeInBytes=128m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails -Xloggc:gc.log -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/oom" 30分钟左右发生一次minor gc ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org