[
https://issues.apache.org/jira/browse/HBASE-17374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15867746#comment-15867746
]
Bhupendra Kumar Jain commented on HBASE-17374:
----------------------------------------------
As I went thru the JIRA and patch
Jira desc says "The instance of TableAuthManager should not be close
unless regionserver is closing."
1) If Accesscontroller coprocessor is configured for regionserver, then
there will be always one TableAuthManager instance. So the issue should not
happen
2) If Accesscontroller coprocessor is not configured for regionserver, then
the issue will happen. This patch only handles the exception case which occurs
during zk event processing and executor is already shutdown.
But I think , as zk watcher holds the ZKPermissionWatcher listener instance, so
ZKPermissionWatcher will keep getting the events (even though already closed)
and it will lead to memory leak of TableAuthManager instance also.
So to solve this completely, ZKPermissionWatcher should unregister itself from
the watcher in close() method. with that it will not get any events further and
it will avoid memory leak.
Please correct if any other understanding.
> ZKPermissionWatcher crashed when grant after region close
> ---------------------------------------------------------
>
> Key: HBASE-17374
> URL: https://issues.apache.org/jira/browse/HBASE-17374
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Affects Versions: 0.98.15
> Reporter: Liu Junhong
> Assignee: Liu Junhong
> Priority: Critical
> Fix For: 2.0.0, 1.4.0
>
> Attachments: 0001-fix-for-HBASE-17374-20161228.patch,
> 0001-fix-for-HBASE-17374.patch
>
>
> It was occurred many time that I granted some permission, but few of some
> regionservers was not token effect and must be restart . When I look up logs,
> I found that :
> 2016-12-08 15:00:26,238 DEBUG
> [RS_CLOSE_REGION-dx-data-hbase-watcher05:60020-0] handler.CloseRegionHandler
> (CloseRegionHandler.java:process(128)) - Processing close of
> data-probe-test,,1481180420784.5f06cb6447343b602e05996bfd87ce14.
> {color:red} 2016-12-08 15:00:26,242 DEBUG
> [RS_CLOSE_REGION-dx-data-hbase-watcher05:60020-0] regionserver.HRegion
> (HRegion.java:doClose(1163)) - Closing
> data-probe-test,,1481180420784.5f06cb6447343b602e05996bfd87ce14.: disabling
> compactions & flushes {color}
> 2016-12-08 15:00:26,242 DEBUG
> [RS_CLOSE_REGION-dx-data-hbase-watcher05:60020-0] regionserver.HRegion
> (HRegion.java:doClose(1190)) - Updates disabled for region
> data-probe-test,,1481180420784.5f06cb6447343b602e05996bfd87ce14.
> 2016-12-08 15:00:26,242 INFO
> [RS_CLOSE_REGION-dx-data-hbase-watcher05:60020-0] regionserver.HRegion
> (HRegion.java:internalFlushcache(1753)) - Started memstore flush for
> data-probe-test,,1481180420784.5f06cb6447343b602e05996bfd87ce14., current
> region memstore size 160
> 2016-12-08 15:00:26,284 INFO
> [RS_CLOSE_REGION-dx-data-hbase-watcher05:60020-0]
> regionserver.DefaultStoreFlusher (DefaultStoreFlusher.java:flushSnapshot(95))
> - Flushed, sequenceid=6, memsize=160, hasBloomFilter=true, into tmp file
> hdfs://dx-data-hbase-watcher/hbase/data/default/data-probe-test/5f06cb6447343b602e05996bfd87ce14/.tmp/8d734ce3d93e40628d8f82111e754cb3
> 2016-12-08 15:00:26,303 DEBUG
> [RS_CLOSE_REGION-dx-data-hbase-watcher05:60020-0]
> regionserver.HRegionFileSystem (HRegionFileSystem.java:commitStoreFile(370))
> - Committing store file
> hdfs://dx-data-hbase-watcher/hbase/data/default/data-probe-test/5f06cb6447343b602e05996bfd87ce14/.tmp/8d734ce3d93e40628d8f82111e754cb3
> as
> hdfs://dx-data-hbase-watcher/hbase/data/default/data-probe-test/5f06cb6447343b602e05996bfd87ce14/cf2/8d734ce3d93e40628d8f82111e754cb3
> 2016-12-08 15:00:26,318 INFO
> [RS_CLOSE_REGION-dx-data-hbase-watcher05:60020-0] regionserver.HStore
> (HStore.java:commitFile(877)) - Added
> hdfs://dx-data-hbase-watcher/hbase/data/default/data-probe-test/5f06cb6447343b602e05996bfd87ce14/cf2/8d734ce3d93e40628d8f82111e754cb3,
> entries=1, sequenceid=6, filesize=985
> 2016-12-08 15:00:26,319 INFO
> [RS_CLOSE_REGION-dx-data-hbase-watcher05:60020-0] regionserver.HRegion
> (HRegion.java:internalFlushcache(1920)) - Finished memstore flush of
> ~160/160, currentsize=0/0 for region
> data-probe-test,,1481180420784.5f06cb6447343b602e05996bfd87ce14. in 77ms,
> sequenceid=6, compaction requested=false
> 2016-12-08 15:00:26,323 INFO
> [StoreCloserThread-data-probe-test,,1481180420784.5f06cb6447343b602e05996bfd87ce14.-1]
> regionserver.HStore (HStore.java:close(774)) - Closed cf1
> 2016-12-08 15:00:26,325 INFO
> [StoreCloserThread-data-probe-test,,1481180420784.5f06cb6447343b602e05996bfd87ce14.-1]
> regionserver.HStore (HStore.java:close(774)) - Closed cf2
> 2016-12-08 15:00:26,326 DEBUG
> [RS_CLOSE_REGION-dx-data-hbase-watcher05:60020-0] coprocessor.CoprocessorHost
> (CoprocessorHost.java:shutdown(292)) - Stop coprocessor
> org.apache.hadoop.hbase.security.token.TokenProvider
> {color:red} 2016-12-08 15:00:26,326 DEBUG
> [RS_CLOSE_REGION-dx-data-hbase-watcher05:60020-0] coprocessor.CoprocessorHost
> (CoprocessorHost.java:shutdown(292)) - Stop coprocessor
> org.apache.hadoop.hbase.security.access.AccessController {color}
> 2016-12-08 15:00:26,327 DEBUG
> [RS_CLOSE_REGION-dx-data-hbase-watcher05:60020-0] coprocessor.CoprocessorHost
> (CoprocessorHost.java:shutdown(292)) - Stop coprocessor
> org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint
> 2016-12-08 15:00:26,327 DEBUG
> [RS_CLOSE_REGION-dx-data-hbase-watcher05:60020-0] coprocessor.CoprocessorHost
> (CoprocessorHost.java:shutdown(292)) - Stop coprocessor
> org.apache.hadoop.hbase.regionserver.ExternalMetricObserver
> 2016-12-08 15:00:26,328 INFO
> [RS_CLOSE_REGION-dx-data-hbase-watcher05:60020-0] regionserver.HRegion
> (HRegion.java:doClose(1279)) - Closed
> data-probe-test,,1481180420784.5f06cb6447343b602e05996bfd87ce14.
> {color:red} 2016-12-08 15:00:27,590 ERROR [regionserver60020-EventThread]
> zookeeper.ClientCnxn (ClientCnxn.java:processEvent(524)) - Error while
> calling watcher
> java.util.concurrent.RejectedExecutionException: Task
> java.util.concurrent.FutureTask@1851ab3a rejected from
> java.util.concurrent.ThreadPoolExecutor@19c0794f[Terminated, pool size = 0,
> active threads = 0, queued tasks = 0, completed tasks = 1]
> at
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
> at
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
> at
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
> at
> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:110)
> at
> java.util.concurrent.Executors$DelegatedExecutorService.submit(Executors.java:638)
> at
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeDataChanged(ZKPermissionWatcher.java:162)
> at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:461)
> at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
> at
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2016-12-08 15:00:27,595 ERROR [regionserver60020-EventThread]
> zookeeper.ClientCnxn (ClientCnxn.java:processEvent(524)) - Error while
> calling watcher
> java.util.concurrent.RejectedExecutionException: Task
> java.util.concurrent.FutureTask@646fdb38 rejected from
> java.util.concurrent.ThreadPoolExecutor@19c0794f[Terminated, pool size = 0,
> active threads = 0, queued tasks = 0, completed tasks = 1]
> at
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
> at
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
> at
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
> at
> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:110)
> at
> java.util.concurrent.Executors$DelegatedExecutorService.submit(Executors.java:638)
> at
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:201)
> at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:468)
> at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
> at
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> {color}
> 2016-12-08 15:01:19,389 INFO
> [PriorityRpcServer.handler=6,queue=0,port=60020] regionserver.HRegionServer
> (HRegionServer.java:openRegion(3935)) - Open
> hbase:acl,,1480596080128.854f160ee9c3f2d1b591d6dbf211956e.
> 2016-12-08 15:01:19,443 DEBUG
> [RS_OPEN_REGION-dx-data-hbase-watcher05:60020-1] zookeeper.ZKAssign
> (ZKAssign.java:transitionNode(832)) - regionserver:60020-0x358ba43097e1ada,
> quorum=dx-data-hbase-watcher02:2181,dx-data-hbase-watcher01:2181,dx-data-hbase-watcher03:2181,
> baseZNode=/hbase Transitioning 854f160ee9c3f2d1b591d6dbf211956e from
> M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING
> {color:red}
> After look up source, I found that :
> 1 AccessController implements RegionObserver and RegionServerObserver.
> 2 The reference of "ZooKeeperWatcher zk" is a same one object when the
> instance of AccessController was be used as RegionObserver and
> RegionServerObserver.
> 3 When closing region,
> org.apache.hadoop.hbase.security.access.AccessController.stop(CoprocessorEnvironment)
> , TableAuthManager.release(authManager),
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.close() will be
> called.
> 4 If there is only one region on regionserver and I close it ,
> TableAuthManager.release(authManager) will close the instance of
> TableAuthManager and shutdown the ExecutorService of ZKPermissionWatcher .
> 5 If I grant a new permission when the ExecutorService is terminated, the
> exception before whitch is a runtimeexcepetion will occurs, and at last the
> zookeeper watcher was crashed.
> So, The instance of TableAuthManager should not be close unless regionserver
> is closing.
> {color}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)