[ https://issues.apache.org/jira/browse/HBASE-22867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zheng Hu resolved HBASE-22867. ------------------------------ Hadoop Flags: Reviewed Release Note: Replace the ForkJoinPool in CleanerChore by ThreadPoolExecutor which can limit the spawn thread size and avoid the master GC frequently. The replacement is an internal implementation in CleanerChore, so no config key change, the upstream users can just upgrade the hbase master without any other change. Tags: master Resolution: Fixed > The ForkJoinPool in CleanerChore will spawn thousands of threads in our > cluster with thousands table > ---------------------------------------------------------------------------------------------------- > > Key: HBASE-22867 > URL: https://issues.apache.org/jira/browse/HBASE-22867 > Project: HBase > Issue Type: Bug > Reporter: Zheng Hu > Assignee: Zheng Hu > Priority: Critical > Fix For: 3.0.0, 2.3.0, 2.2.1, 2.1.6 > > Attachments: 191318.stack, 191318.stack.1, 31162.stack.1 > > > The thousands of spawned threads make the safepoint cost 80+s in our Master > JVM processs. > {code} > 2019-08-15,19:35:35,861 INFO [main-SendThread(zjy-hadoop-prc-zk02.bj:11000)] > org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard > from server in 82260ms for sessionid 0x1691332e2d3aae5, closing socket > connection and at > tempting reconnect > {code} > The stdout from JVM (can see from here there're 9126 threads & sync cost 80+s) > {code} > vmop [threads: total initially_running wait_to_block] > [time: spin block sync cleanup vmop] page_trap_count > 32358.859: ForceAsyncSafepoint [ 9126 67 > 474 ] [ 1 28 86596 87 101 ] 0 > {code} > Also we got the jstack: > {code} > $ cat 31162.stack.1 | grep 'ForkJoinPool-1-worker' | wc -l > 8648 > {code} > It's a dangerous bug, make it as blocker. -- This message was sent by Atlassian Jira (v8.3.2#803003)