Liu Junhong created HBASE-17374:
-----------------------------------
Summary: ZKPermissionWatcher crashed when grant after close region
Key: HBASE-17374
URL: https://issues.apache.org/jira/browse/HBASE-17374
Project: HBase
Issue Type: Bug
Components: regionserver
Affects Versions: 0.98.15
Reporter: Liu Junhong
Priority: Critical
It was occurred many time that I granted some permission, but few of some
regionservers was not token effect. When I look up logs, I found that :
2016-12-08 15:00:26,238 DEBUG [RS_CLOSE_REGION-dx-data-hbase-watcher05:60020-0]
handler.CloseRegionHandler (CloseRegionHandler.java:process(128)) - Processing
close of data-probe-test,,1481180420784.5f06cb6447343b602e05996bfd87ce14.
2016-12-08 15:00:26,242 DEBUG [RS_CLOSE_REGION-dx-data-hbase-watcher05:60020-0]
regionserver.HRegion (HRegion.java:doClose(1163)) - Closing
data-probe-test,,1481180420784.5f06cb6447343b602e05996bfd87ce14.: disabling
compactions & flushes
2016-12-08 15:00:26,242 DEBUG [RS_CLOSE_REGION-dx-data-hbase-watcher05:60020-0]
regionserver.HRegion (HRegion.java:doClose(1190)) - Updates disabled for region
data-probe-test,,1481180420784.5f06cb6447343b602e05996bfd87ce14.
2016-12-08 15:00:26,242 INFO [RS_CLOSE_REGION-dx-data-hbase-watcher05:60020-0]
regionserver.HRegion (HRegion.java:internalFlushcache(1753)) - Started memstore
flush for data-probe-test,,1481180420784.5f06cb6447343b602e05996bfd87ce14.,
current region memstore size 160
2016-12-08 15:00:26,284 INFO [RS_CLOSE_REGION-dx-data-hbase-watcher05:60020-0]
regionserver.DefaultStoreFlusher (DefaultStoreFlusher.java:flushSnapshot(95)) -
Flushed, sequenceid=6, memsize=160, hasBloomFilter=true, into tmp file
hdfs://dx-data-hbase-watcher/hbase/data/default/data-probe-test/5f06cb6447343b602e05996bfd87ce14/.tmp/8d734ce3d93e40628d8f82111e754cb3
2016-12-08 15:00:26,303 DEBUG [RS_CLOSE_REGION-dx-data-hbase-watcher05:60020-0]
regionserver.HRegionFileSystem (HRegionFileSystem.java:commitStoreFile(370)) -
Committing store file
hdfs://dx-data-hbase-watcher/hbase/data/default/data-probe-test/5f06cb6447343b602e05996bfd87ce14/.tmp/8d734ce3d93e40628d8f82111e754cb3
as
hdfs://dx-data-hbase-watcher/hbase/data/default/data-probe-test/5f06cb6447343b602e05996bfd87ce14/cf2/8d734ce3d93e40628d8f82111e754cb3
2016-12-08 15:00:26,318 INFO [RS_CLOSE_REGION-dx-data-hbase-watcher05:60020-0]
regionserver.HStore (HStore.java:commitFile(877)) - Added
hdfs://dx-data-hbase-watcher/hbase/data/default/data-probe-test/5f06cb6447343b602e05996bfd87ce14/cf2/8d734ce3d93e40628d8f82111e754cb3,
entries=1, sequenceid=6, filesize=985
2016-12-08 15:00:26,319 INFO [RS_CLOSE_REGION-dx-data-hbase-watcher05:60020-0]
regionserver.HRegion (HRegion.java:internalFlushcache(1920)) - Finished
memstore flush of ~160/160, currentsize=0/0 for region
data-probe-test,,1481180420784.5f06cb6447343b602e05996bfd87ce14. in 77ms,
sequenceid=6, compaction requested=false
2016-12-08 15:00:26,323 INFO
[StoreCloserThread-data-probe-test,,1481180420784.5f06cb6447343b602e05996bfd87ce14.-1]
regionserver.HStore (HStore.java:close(774)) - Closed cf1
2016-12-08 15:00:26,325 INFO
[StoreCloserThread-data-probe-test,,1481180420784.5f06cb6447343b602e05996bfd87ce14.-1]
regionserver.HStore (HStore.java:close(774)) - Closed cf2
2016-12-08 15:00:26,326 DEBUG [RS_CLOSE_REGION-dx-data-hbase-watcher05:60020-0]
coprocessor.CoprocessorHost (CoprocessorHost.java:shutdown(292)) - Stop
coprocessor org.apache.hadoop.hbase.security.token.TokenProvider
2016-12-08 15:00:26,326 DEBUG [RS_CLOSE_REGION-dx-data-hbase-watcher05:60020-0]
coprocessor.CoprocessorHost (CoprocessorHost.java:shutdown(292)) - Stop
coprocessor org.apache.hadoop.hbase.security.access.AccessController
2016-12-08 15:00:26,327 DEBUG [RS_CLOSE_REGION-dx-data-hbase-watcher05:60020-0]
coprocessor.CoprocessorHost (CoprocessorHost.java:shutdown(292)) - Stop
coprocessor org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint
2016-12-08 15:00:26,327 DEBUG [RS_CLOSE_REGION-dx-data-hbase-watcher05:60020-0]
coprocessor.CoprocessorHost (CoprocessorHost.java:shutdown(292)) - Stop
coprocessor org.apache.hadoop.hbase.regionserver.ExternalMetricObserver
2016-12-08 15:00:26,328 INFO [RS_CLOSE_REGION-dx-data-hbase-watcher05:60020-0]
regionserver.HRegion (HRegion.java:doClose(1279)) - Closed
data-probe-test,,1481180420784.5f06cb6447343b602e05996bfd87ce14.
2016-12-08 15:00:27,590 ERROR [regionserver60020-EventThread]
zookeeper.ClientCnxn (ClientCnxn.java:processEvent(524)) - Error while calling
watcher
java.util.concurrent.RejectedExecutionException: Task
java.util.concurrent.FutureTask@1851ab3a rejected from
java.util.concurrent.ThreadPoolExecutor@19c0794f[Terminated, pool size = 0,
active threads = 0, queued tasks = 0, completed tasks = 1]
at
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
at
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
at
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
at
java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:110)
at
java.util.concurrent.Executors$DelegatedExecutorService.submit(Executors.java:638)
at
org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeDataChanged(ZKPermissionWatcher.java:162)
at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:461)
at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
2016-12-08 15:00:27,595 ERROR [regionserver60020-EventThread]
zookeeper.ClientCnxn (ClientCnxn.java:processEvent(524)) - Error while calling
watcher
java.util.concurrent.RejectedExecutionException: Task
java.util.concurrent.FutureTask@646fdb38 rejected from
java.util.concurrent.ThreadPoolExecutor@19c0794f[Terminated, pool size = 0,
active threads = 0, queued tasks = 0, completed tasks = 1]
at
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
at
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
at
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
at
java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:110)
at
java.util.concurrent.Executors$DelegatedExecutorService.submit(Executors.java:638)
at
org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:201)
at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:468)
at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
2016-12-08 15:01:19,389 INFO [PriorityRpcServer.handler=6,queue=0,port=60020]
regionserver.HRegionServer (HRegionServer.java:openRegion(3935)) - Open
hbase:acl,,1480596080128.854f160ee9c3f2d1b591d6dbf211956e.
2016-12-08 15:01:19,443 DEBUG [RS_OPEN_REGION-dx-data-hbase-watcher05:60020-1]
zookeeper.ZKAssign (ZKAssign.java:transitionNode(832)) -
regionserver:60020-0x358ba43097e1ada,
quorum=dx-data-hbase-watcher02:2181,dx-data-hbase-watcher01:2181,dx-data-hbase-watcher03:2181,
baseZNode=/hbase Transitioning 854f160ee9c3f2d1b591d6dbf211956e from
M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING
After look up source, I found that :
1 AccessController implements RegionObserver and RegionServerObserver.
2 The reference of "ZooKeeperWatcher zk" is a same one object when the instance
of AccessController was be used as RegionObserver and RegionServerObserver.
3 When closing region,
org.apache.hadoop.hbase.security.access.AccessController.stop(CoprocessorEnvironment)
, TableAuthManager.release(authManager),
org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.close() will be
called, it leads an ExecutorService of ZKPermissionWatcher be shutdown few
times util a region was open.
4 If I grant a new permission when the ExecutorService is terminated, the
exception before whitch is a runtimeexcepetion will occurs, and at last the
zookeeper watcher was crashed.
So i want fix this bug like : if the instance of AccessController is be used as
RegionObserver (on the basis of CoprocessorEnvironment), it should not call
TableAuthManager.release(authManager).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)