[
https://issues.apache.org/jira/browse/HBASE-9779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
stack updated HBASE-9779:
-------------------------
Priority: Critical (was: Major)
> IntegrationTestLoadAndVerify fails deleting IntegrationTestLoadAndVerify
> table
> -------------------------------------------------------------------------------
>
> Key: HBASE-9779
> URL: https://issues.apache.org/jira/browse/HBASE-9779
> Project: HBase
> Issue Type: Bug
> Components: test
> Affects Versions: 0.96.0
> Reporter: stack
> Assignee: stack
> Priority: Critical
> Attachments: 9779part.txt
>
>
> As part of the test, we want to delete the created table to restore cluster
> state. Interestingly we can disable the table successfully but then
> immediately after we fail the delete because we cannot get the table
> descriptor -- getting the file descriptor is used to test if table is present.
> The test for getDescriptor is kinda broke because it throws base IOE which
> causes clients to retry over and over again as though the descriptor was
> going to come back.
> This bug is kinda ugly because in at least one case it caused our
> long-running hbase-it suite run to fail so would be good to fix.
> Here is sample from a test run:
> {code}
> Disabling table IntegrationTestLoadAndVerify 2013-10-11 18:27:53,485 INFO
> [main] client.HBaseAdmin: Started disable of IntegrationTestLoadAndVerify
> 2013-10-11 18:27:53,526 INFO [main] zookeeper.ZooKeeper: Initiating client
> connection, connectString=a1805.halxg.cloudera.com:2181 sessionTimeout=90000
> watcher=catalogtracker-on-hconnection-0x5a7e666f
> 2013-10-11 18:27:53,527 INFO [main] zookeeper.RecoverableZooKeeper: Process
> identifier=catalogtracker-on-hconnection-0x5a7e666f connecting to ZooKeeper
> ensemble=a1805.halxg.cloudera.com:2181
> 2013-10-11 18:27:53,527 INFO
> [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn:
> Opening socket connection to server
> a1805.halxg.cloudera.com/10.20.200.105:2181. Will not attempt to authenticate
> using SASL (unknown error)
> 2013-10-11 18:27:53,527 DEBUG [main] catalog.CatalogTracker: Starting catalog
> tracker org.apache.hadoop.hbase.catalog.CatalogTracker@4ace08a5
> 2013-10-11 18:27:53,529 INFO
> [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: Socket
> connection established to a1805.halxg.cloudera.com/10.20.200.105:2181,
> initiating session
> 2013-10-11 18:27:53,539 INFO
> [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn:
> Session establishment complete on server
> a1805.halxg.cloudera.com/10.20.200.105:2181, sessionid = 0x1412d47f53a5c70,
> negotiated timeout = 40000
> 2013-10-11 18:27:53,602 DEBUG [main] catalog.CatalogTracker: Stopping catalog
> tracker org.apache.hadoop.hbase.catalog.CatalogTracker@4ace08a5
> 2013-10-11 18:27:53,662 INFO [main] zookeeper.ZooKeeper: Session:
> 0x1412d47f53a5c70 closed
> 2013-10-11 18:27:53,662 INFO [main-EventThread] zookeeper.ClientCnxn:
> EventThread shut down
> .2013-10-11 18:27:54,666 INFO [main] zookeeper.ZooKeeper: Initiating client
> connection, connectString=a1805.halxg.cloudera.com:2181 sessionTimeout=90000
> watcher=catalogtracker-on-hconnection-0x5a7e666f
> 2013-10-11 18:27:54,667 INFO [main] zookeeper.RecoverableZooKeeper: Process
> identifier=catalogtracker-on-hconnection-0x5a7e666f connecting to ZooKeeper
> ensemble=a1805.halxg.cloudera.com:2181
> 2013-10-11 18:27:54,667 INFO
> [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn:
> Opening socket connection to server
> a1805.halxg.cloudera.com/10.20.200.105:2181. Will not attempt to authenticate
> using SASL (unknown error)
> 2013-10-11 18:27:54,667 DEBUG [main] catalog.CatalogTracker: Starting catalog
> tracker org.apache.hadoop.hbase.catalog.CatalogTracker@692c0c5d
> 2013-10-11 18:27:54,667 INFO
> [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: Socket
> connection established to a1805.halxg.cloudera.com/10.20.200.105:2181,
> initiating session
> 2013-10-11 18:27:54,696 INFO
> [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn:
> Session establishment complete on server
> a1805.halxg.cloudera.com/10.20.200.105:2181, sessionid = 0x1412d47f53a5c71,
> negotiated timeout = 40000
> 2013-10-11 18:27:54,821 DEBUG [main] catalog.CatalogTracker: Stopping catalog
> tracker org.apache.hadoop.hbase.catalog.CatalogTracker@692c0c5d
> 2013-10-11 18:27:54,871 INFO [main] zookeeper.ZooKeeper: Session:
> 0x1412d47f53a5c71 closed
> 2013-10-11 18:27:54,871 INFO [main-EventThread] zookeeper.ClientCnxn:
> EventThread shut down
> .2013-10-11 18:27:55,890 INFO [main] zookeeper.ZooKeeper: Initiating client
> connection, connectString=a1805.halxg.cloudera.com:2181 sessionTimeout=90000
> watcher=catalogtracker-on-hconnection-0x5a7e666f
> 2013-10-11 18:27:55,891 INFO [main] zookeeper.RecoverableZooKeeper: Process
> identifier=catalogtracker-on-hconnection-0x5a7e666f connecting to ZooKeeper
> ensemble=a1805.halxg.cloudera.com:2181
> 2013-10-11 18:27:55,891 INFO
> [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn:
> Opening socket connection to server
> a1805.halxg.cloudera.com/10.20.200.105:2181. Will not attempt to authenticate
> using SASL (unknown error)
> 2013-10-11 18:27:55,891 DEBUG [main] catalog.CatalogTracker: Starting catalog
> tracker org.apache.hadoop.hbase.catalog.CatalogTracker@74234057
> 2013-10-11 18:27:55,892 INFO
> [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn: Socket
> connection established to a1805.halxg.cloudera.com/10.20.200.105:2181,
> initiating session
> 2013-10-11 18:27:55,896 INFO
> [main-SendThread(a1805.halxg.cloudera.com:2181)] zookeeper.ClientCnxn:
> Session establishment complete on server
> a1805.halxg.cloudera.com/10.20.200.105:2181, sessionid = 0x1412d47f53a5c72,
> negotiated timeout = 40000
> 2013-10-11 18:27:55,908 DEBUG [main] catalog.CatalogTracker: Stopping catalog
> tracker org.apache.hadoop.hbase.catalog.CatalogTracker@74234057
> 2013-10-11 18:27:55,912 INFO [main] zookeeper.ZooKeeper: Session:
> 0x1412d47f53a5c72 closed
> 2013-10-11 18:27:55,912 INFO [main-EventThread] zookeeper.ClientCnxn:
> EventThread shut down
> 2389 ms
> Deleting table IntegrationTestLoadAndVerify
> 2013-10-11 18:33:02,029 INFO [main] hbase.HBaseCluster: Restoring cluster -
> started
> 2013-10-11 18:33:02,032 INFO [main] hbase.HBaseCluster: Added new HBaseAdmin
> 2013-10-11 18:33:02,032 INFO [main] hbase.HBaseCluster: Restoring cluster -
> done
> 2013-10-11 18:33:02,033 ERROR [main] util.AbstractHBaseTool: Error running
> command-line tool
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
> attempts=35, exceptions:
> Fri Oct 11 18:27:55 PDT 2013,
> org.apache.hadoop.hbase.client.RpcRetryingCaller@48c6f1e8,
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException):
> java.io.IOException: HTableDescriptor missing for IntegrationTestLoadAndVerify
> at
> org.apache.hadoop.hbase.master.handler.TableEventHandler.getTableDescriptor(TableEventHandler.java:231)
> at
> org.apache.hadoop.hbase.master.handler.DeleteTableHandler.prepareWithTableLock(DeleteTableHandler.java:58)
> at
> org.apache.hadoop.hbase.master.handler.TableEventHandler.prepare(TableEventHandler.java:93)
> at org.apache.hadoop.hbase.master.HMaster.deleteTable(HMaster.java:1816)
> at org.apache.hadoop.hbase.master.HMaster.deleteTable(HMaster.java:1826)
> at
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:38213)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)
> ....
> {code}
> Looking on master logs, it looks fine:
> {code}
> 1 2013-10-11 18:27:55,541 DEBUG [MASTER_TABLE_OPERATIONS-a1805:60000-0]
> handler.DisableTableHandler: Disable waiting until done; 298996 ms remaining;
> []
> 0 2013-10-11 18:27:55,546 INFO [MASTER_TABLE_OPERATIONS-a1805:60000-0]
> handler.DisableTableHandler: Disabled table is done=true
> 1 2013-10-11 18:27:55,555 DEBUG [MASTER_TABLE_OPERATIONS-a1805:60000-0]
> lock.ZKInterProcessLockBase: Released
> /hbase/table-lock/IntegrationTestLoadAndVerify/write-master:600000000000001
> 2 2013-10-11 18:27:55,923 INFO [RpcServer.handler=14,port=60000]
> master.HMaster: Client=hbase//10.20.200.112 delete
> IntegrationTestLoadAndVerify
> 3 2013-10-11 18:27:55,931 DEBUG [RpcServer.handler=14,port=60000]
> lock.ZKInterProcessLockBase: Acquired a lock for
> /hbase/table-lock/IntegrationTestLoadAndVerify/write-master:600000000000002
> 4 2013-10-11 18:27:55,954 DEBUG [RpcServer.handler=14,port=60000]
> lock.ZKInterProcessLockBase: Released
> /hbase/table-lock/IntegrationTestLoadAndVerify/write-master:600000000000002
> 5 2013-10-11 18:27:56,167 INFO [RpcServer.handler=15,port=60000]
> master.HMaster: Client=hbase//10.20.200.112 delete
> IntegrationTestLoadAndVerify
> ...
> {code}
> Oddly no exceptions other than corrupted snapshot because mssing hfile. All
> seems to have been operating fine. Will look more.
--
This message was sent by Atlassian JIRA
(v6.1#6144)