[ 
https://issues.apache.org/jira/browse/HBASE-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Spiegelberg updated HBASE-2827:
---------------------------------------

    Priority: Major  (was: Critical)

Downgrading priority after further investigation.  Still working on this issue, 
but HBASE-2828 was really the critical patch.

11:47:48 AM Nicolas Spiegelberg: should I push 2828 to the titan hbase, or 
should we wait for the trunk refresh.  that jira should fix the Exception that 
we saw on our cluster
11:49:22 AM Kannan: this is the master failover for client?
11:49:48 AM Nicolas Spiegelberg: it's decoupling the HTable from the master
11:50:18 AM Nicolas Spiegelberg: HBaseAdmin is the one that has master failover 
problems, but a client only uses it when disabling tables, creating new tables, 
etc
11:50:57 AM Kannan: i was under the impression that HBaseAdmin was the more 
critical one... but I think you are right.
11:51:08 AM Nicolas Spiegelberg: still needs to be fixed, but our problem was 
that they used the HBaseAdmin code, which is almost never used and sometimes 
buggy, inside the HTable code, which is used all the time
11:54:05 AM Nicolas Spiegelberg: I'm still working on 2827, which is the 
failover.  2828 just relies on zookeeper instead of the master.

> HBase Client doesn't handle master failover
> -------------------------------------------
>
>                 Key: HBASE-2827
>                 URL: https://issues.apache.org/jira/browse/HBASE-2827
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>             Fix For: 0.90.0
>
>
> A client on our beta tier was stuck in this exception loop when we issued a 
> new HMaster after the old one died:
> Exception while trying to connect hBase
> java.lang.reflect.UndeclaredThrowableException
> at $Proxy1.getClusterStatus(Unknown Source)
> at 
> org.apache.hadoop.hbase.client.HBaseAdmin.getClusterStatus(HBaseAdmin.java:912)
> at org.apache.hadoop.hbase.client.HTable.getCurrentNrHRS(HTable.java:170)
> at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:143)
> ...
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.net.SocketTimeoutException: 20000 millis timeout while 
> waiting for channel to be ready for connect. ch : 
> java.nio.channels.SocketChannel[connection-pending remote=/10.18.34.212:60000]
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:406)
> at 
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:309)
> at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:856)
> at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:724)
> at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:252)
> ... 20 more
> 12:52:55,863 [pool-4-thread-5182] INFO PersistentUtil:153 - Retry after 1 
> second...
> Looking at the client code, the HConnectionManager does not watch ZK for 
> NodeDeleted & NodeCreated of /hbase/master

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to