[
https://issues.apache.org/jira/browse/HBASE-12847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277614#comment-14277614
]
Hudson commented on HBASE-12847:
--------------------------------
SUCCESS: Integrated in HBase-TRUNK #6023 (See
[https://builds.apache.org/job/HBase-TRUNK/6023/])
HBASE-12847 TestZKLessSplitOnCluster frequently times out in 0.98
builds(Rajeshbabu) (rajeshbabu: rev 7e872a81a9f404123ae583517b2db10e0bb7649c)
*
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java
> TestZKLessSplitOnCluster frequently times out in 0.98 builds
> ------------------------------------------------------------
>
> Key: HBASE-12847
> URL: https://issues.apache.org/jira/browse/HBASE-12847
> Project: HBase
> Issue Type: Bug
> Reporter: Andrew Purtell
> Assignee: Rajeshbabu Chintaguntla
> Fix For: 1.0.0, 2.0.0, 0.98.10, 1.1.0
>
> Attachments: HBASE-12847.patch, HBASE-12847_98.patch,
> HBASE-12847_branch-1.patch, test.log.bad.gz, test.log.good.gz
>
>
> Gets hung up in testSSHCleanupDaugtherRegionsOfAbortedSplit waiting on
> deleteTable
> {noformat}
> "Thread-334" prio=10 tid=0x00007f15382da800 nid=0x40ae in Object.wait()
> [0x00007
> f1315f5d000]
> java.lang.Thread.State: TIMED_WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x00000007e1b525b8> (a
> org.apache.hadoop.hbase.ipc.RpcClie
> nt$Call)
> at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1452)
> - locked <0x00000007e1b525b8> (a
> org.apache.hadoop.hbase.ipc.RpcClient$C
> all)
> at
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.ja
> va:1661)
> at
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementatio
> n.callBlockingMethod(RpcClient.java:1719)
> at
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService
> $BlockingStub.disableTable(MasterProtos.java:43749)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplemen
> tation$5.disableTable(HConnectionManager.java:1995)
> at
> org.apache.hadoop.hbase.client.HBaseAdmin$5.call(HBaseAdmin.java:947)
> at
> org.apache.hadoop.hbase.client.HBaseAdmin$5.call(HBaseAdmin.java:942)
> at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcR
> etryingCaller.java:117)
> - locked <0x00000007dfffe938> (a
> org.apache.hadoop.hbase.client.RpcRetry
> ingCaller)
> at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcR
> etryingCaller.java:93)
> - locked <0x00000007dfffe938> (a
> org.apache.hadoop.hbase.client.RpcRetry
> ingCaller)
> at
> org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.
> java:3398)
> at
> org.apache.hadoop.hbase.client.HBaseAdmin.disableTableAsync(HBaseAdmi
> n.java:942)
> at
> org.apache.hadoop.hbase.client.HBaseAdmin.disableTable(HBaseAdmin.jav
> a:974)
> at
> org.apache.hadoop.hbase.HBaseTestingUtility.deleteTable(HBaseTestingUtility.java:1532)
> at
> org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.testSSHCleanupDaugtherRegionsOfAbortedSplit(TestSplitTransactionOnCluster.java:1172)
> {noformat}
> See attached test.log.good.gz and test.log.bad.gz.
> In test.log.bad at 2015-01-13 08:02:45,947 we acquire a lock on
> testSSHCleanupDaugtherRegionsOfAbortedSplit to start a disable but do nothing
> afterward. Nothing happens for one minute. Then looks like the client makes
> another request but it can't get the table lock. There's no progress until
> timeout.
> {noformat}
> 2015-01-13 08:02:45,947 INFO [Thread-334] client.HBaseAdmin$5(945): Started
> disable of testSSHCleanupDaugtherRegionsOfAbortedSplit
> 2015-01-13 08:02:45,948 INFO [FifoRpcScheduler.handler1-thread-4]
> master.HMaster(2213): Client=apurtell//10.40.8.95 disable
> testSSHCleanupDaugtherRegionsOfAbortedSplit
> 2015-01-13 08:02:45,950 DEBUG [FifoRpcScheduler.handler1-thread-4]
> lock.ZKInterProcessLockBase(226): Acquired a lock for
> /hbase/table-lock/testSSHCleanupDaugtherRegionsOfAbortedSplit/write-master:367740000000001
> 2015-01-13 08:03:44,585 DEBUG
> [ip-10-40-8-95.us-west-2.compute.internal,36774,1421136162827-BalancerChore]
> master.HMaster(1553): Not running balancer because 3 region(s) in transition:
> {e4f79ac2b4711f7d906a291d94302d6e={e4f79ac2b4711f7d906a291d94302d6e
> state=SPLITTING, ts=1421136165837,
> server=ip-10-40-8-95.us-west-2.compute.internal,47769,1421136163047},
> ebbcf66ec09960d42a2a49252599bb6d={ebbcf66ec09960d42a2a49252599bb6d
> state=SPLITTI...
> 2015-01-13 08:03:46,210 INFO [Thread-334] client.HBaseAdmin$5(945): Started
> disable of testSSHCleanupDaugtherRegionsOfAbortedSplit
> 2015-01-13 08:03:46,211 INFO [FifoRpcScheduler.handler1-thread-3]
> master.HMaster(2213): Client=apurtell//10.40.8.95 disable
> testSSHCleanupDaugtherRegionsOfAbortedSplit
> 2015-01-13 08:03:46,214 DEBUG [FifoRpcScheduler.handler1-thread-3]
> master.TableLockManager$ZKTableLockManager$1(242): Table is locked by
> [tableName=
> default+testSSHCleanupDaugtherRegionsOfAbortedSplit,
> lockOwner=ip-10-40-8-95.us-west-2.compute.internal,36774,1421136162827,
> threadId=486, purpose=C_M_DISABLE_TABLE, isShared=false,
> createTime=1421136165948]
> {noformat}
> If we look at test.log.good where the same sequence of events begin, the
> table is quickly disabled and deleted:
> {noformat}
> 2015-01-13 08:01:47,923 INFO [Thread-333] client.HBaseAdmin$5(945): Started
> disable of testSSHCleanupDaugtherRegionsOfAbortedSplit
> 2015-01-13 08:01:47,924 INFO [FifoRpcScheduler.handler1-thread-5]
> master.HMaster(2213): Client=apurtell//10.40.8.95 disable
> testSSHCleanupDaugtherRegionsOfAbortedSplit
> 2015-01-13 08:01:47,926 DEBUG [FifoRpcScheduler.handler1-thread-5]
> lock.ZKInterProcessLockBase(226): Acquired a lock for
> /hbase/table-lock/testSSHCleanupDaugtherRegionsOfAbortedSplit/write-master:383350000000001
> 2015-01-13 08:01:47,932 INFO [MASTER_TABLE_OPERATIONS-ip-10-40-8-95:38335-0]
> handler.DisableTableHandler(130): Attempting to disable table
> testSSHCleanupDaugtherRegionsOfAbortedSplit
> 2015-01-13 08:01:47,933 INFO [Thread-333]
> zookeeper.RecoverableZooKeeper(121): Process
> identifier=catalogtracker-on-hconnection-0x60cf9dfa connecting to ZooKeeper
> ensemble=localhost:59024
> 2015-01-13 08:01:47,933 DEBUG [Thread-333] catalog.CatalogTracker(198):
> Starting catalog tracker
> org.apache.hadoop.hbase.catalog.CatalogTracker@610b218c
> 2015-01-13 08:01:47,935 DEBUG [Thread-333-EventThread]
> zookeeper.ZooKeeperWatcher(313): catalogtracker-on-hconnection-0x60cf9dfa0x0,
> quorum=localhost:59024, baseZNode=/hbase Received ZooKeeper Event, type=None,
> state=SyncConnected, path=null
> 2015-01-13 08:01:47,935 DEBUG [Thread-333] zookeeper.ZKUtil(430):
> catalogtracker-on-hconnection-0x60cf9dfa0x0, quorum=localhost:59024,
> baseZNode=/hbase Set watcher on existing znode=/hbase/meta-region-server
> 2015-01-13 08:01:47,936 INFO [MASTER_TABLE_OPERATIONS-ip-10-40-8-95:38335-0]
> handler.DisableTableHandler(190): Disabled table,
> testSSHCleanupDaugtherRegionsOfAbortedSplit, is done=true
> 2015-01-13 08:01:47,936 DEBUG [Thread-333-EventThread]
> zookeeper.ZooKeeperWatcher(390):
> catalogtracker-on-hconnection-0x60cf9dfa-0x14ae2500c300017 connected
> 2015-01-13 08:01:47,938 DEBUG [MASTER_TABLE_OPERATIONS-ip-10-40-8-95:38335-0]
> lock.ZKInterProcessLockBase(328): Released
> /hbase/table-lock/testSSHCleanupDaugtherRegionsOfAbortedSplit/write-master:383350000000001
> 2015-01-13 08:01:47,941 DEBUG [Thread-333] catalog.CatalogTracker(222):
> Stopping catalog tracker
> org.apache.hadoop.hbase.catalog.CatalogTracker@610b218c
> 2015-01-13 08:01:47,945 INFO [Thread-333] client.HBaseAdmin(999): Disabled
> testSSHCleanupDaugtherRegionsOfAbortedSplit
> 2015-01-13 08:01:47,946 INFO [FifoRpcScheduler.handler1-thread-2]
> master.HMaster(2047): Client=apurtell//10.40.8.95 delete
> testSSHCleanupDaugtherRegionsOfAbortedSplit
> 2015-01-13 08:01:47,948 DEBUG [FifoRpcScheduler.handler1-thread-2]
> lock.ZKInterProcessLockBase(226): Acquired a lock for
> /hbase/table-lock/testSSHCleanupDaugtherRegionsOfAbortedSplit/write-master:383350000000002
> 2015-01-13 08:01:47,953 INFO [MASTER_TABLE_OPERATIONS-ip-10-40-8-95:38335-0]
> handler.TableEventHandler(123): Handling table operation C_M_DELETE_TABLE on
> table testSSHCleanupDaugtherRegionsOfAbortedSplit
> 2015-01-13 08:01:47,958 DEBUG [MASTER_TABLE_OPERATIONS-ip-10-40-8-95:38335-0]
> handler.DeleteTableHandler(93): Deleting regions from META
> 2015-01-13 08:01:47,962 INFO [MASTER_TABLE_OPERATIONS-ip-10-40-8-95:38335-0]
> catalog.MetaEditor(496): Deleted [{ENCODED =>
> bc81e04ede70636e30fd1224df8bc70b, NAME =>
> 'testSSHCleanupDaugtherRegionsOfAbortedSplit,,1421136107549.bc81e04ede70636e30fd1224df8bc70b.',
> STARTKEY => '', ENDKEY => ''}]
> 2015-01-13 08:01:47,964 DEBUG [MASTER_TABLE_OPERATIONS-ip-10-40-8-95:38335-0]
> handler.DeleteTableHandler(104): Archiving region
> testSSHCleanupDaugtherRegionsOfAbortedSplit,,1421136107549.bc81e04ede70636e30fd1224df8bc70b.
> from FS
> 2015-01-13 08:01:47,965 DEBUG [MASTER_TABLE_OPERATIONS-ip-10-40-8-95:38335-0]
> backup.HFileArchiver(93): ARCHIVING
> hdfs://localhost:43493/user/apurtell/hbase/.tmp/data/default/testSSHCleanupDaugtherRegionsOfAbortedSplit/bc81e04ede70636e30fd1224df8bc70b
> 2015-01-13 08:01:47,969 DEBUG [MASTER_TABLE_OPERATIONS-ip-10-40-8-95:38335-0]
> backup.HFileArchiver(134): Archiving [class
> org.apache.hadoop.hbase.backup.HFileArchiver$FileablePath,
> file:hdfs://localhost:43493/user/apurtell/hbase/.tmp/data/default/testSSHCleanupDaugtherRegionsOfAbortedSplit/bc81e04ede70636e30fd1224df8bc70b/f]
> 2015-01-13 08:01:47,978 DEBUG [MASTER_TABLE_OPERATIONS-ip-10-40-8-95:38335-0]
> backup.HFileArchiver(438): Finished archiving from class
> org.apache.hadoop.hbase.backup.HFileArchiver$FileablePath,
> file:hdfs://localhost:43493/user/apurtell/hbase/.tmp/data/default/testSSHCleanupDaugtherRegionsOfAbortedSplit/bc81e04ede70636e30fd1224df8bc70b/f/9b6aaa5d9dd44f3ab113a32c029d50ae,
> to
> hdfs://localhost:43493/user/apurtell/hbase/archive/data/default/testSSHCleanupDaugtherRegionsOfAbortedSplit/bc81e04ede70636e30fd1224df8bc70b/f/9b6aaa5d9dd44f3ab113a32c029d50ae
> 2015-01-13 08:01:47,979 INFO [IPC Server handler 6 on 43493]
> blockmanagement.BlockManager(1009): BLOCK* addToInvalidates:
> blk_1073741839_1015 127.0.0.1:49933 127.0.0.1:55797 127.0.0.1:57090
> 2015-01-13 08:01:47,979 DEBUG [MASTER_TABLE_OPERATIONS-ip-10-40-8-95:38335-0]
> backup.HFileArchiver(453): Deleted all region files in:
> hdfs://localhost:43493/user/apurtell/hbase/.tmp/data/default/testSSHCleanupDaugtherRegionsOfAbortedSplit/bc81e04ede70636e30fd1224df8bc70b
> 2015-01-13 08:01:47,980 INFO [IPC Server handler 2 on 43493]
> blockmanagement.BlockManager(1009): BLOCK* addToInvalidates:
> blk_1073741838_1014 127.0.0.1:57090 127.0.0.1:49933 127.0.0.1:55797
> 2015-01-13 08:01:47,980 DEBUG [MASTER_TABLE_OPERATIONS-ip-10-40-8-95:38335-0]
> handler.DeleteTableHandler(114): Table
> 'testSSHCleanupDaugtherRegionsOfAbortedSplit' archived!
> 2015-01-13 08:01:47,980 DEBUG [MASTER_TABLE_OPERATIONS-ip-10-40-8-95:38335-0]
> handler.DeleteTableHandler(117): Removing
> 'testSSHCleanupDaugtherRegionsOfAbortedSplit' descriptor.
> 2015-01-13 08:01:47,981 DEBUG [MASTER_TABLE_OPERATIONS-ip-10-40-8-95:38335-0]
> handler.DeleteTableHandler(121): Removing
> 'testSSHCleanupDaugtherRegionsOfAbortedSplit' from region states.
> 2015-01-13 08:01:47,981 DEBUG [MASTER_TABLE_OPERATIONS-ip-10-40-8-95:38335-0]
> handler.DeleteTableHandler(125): Marking
> 'testSSHCleanupDaugtherRegionsOfAbortedSplit' as deleted.
> 2015-01-13 08:01:47,984 DEBUG [MASTER_TABLE_OPERATIONS-ip-10-40-8-95:38335-0]
> lock.ZKInterProcessLockBase(328): Released
> /hbase/table-lock/testSSHCleanupDaugtherRegionsOfAbortedSplit/write-master:383350000000002
> {noformat}
> The first interesting line missing in test.log.bad is:
> {noformat}
> 2015-01-13 08:01:47,932 INFO [MASTER_TABLE_OPERATIONS-ip-10-40-8-95:38335-0]
> handler.DisableTableHandler(130): Attempting to disable table
> testSSHCleanupDaugtherRegionsOfAbortedSplit
> {noformat}
> Nowhere in test.log.bad does "Attempting to disable table
> testSSHCleanupDaugtherRegionsOfAbortedSplit" appear.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)