[ 
https://issues.apache.org/jira/browse/HBASE-26022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhuobin zheng updated HBASE-26022:
----------------------------------
    Description: 
In our product hbase cluster, we occasionally encounter below errors, and stuck 
hbase a long time. Then hbase requests to this machine will fail forever.
{code:java}
WARN org.apache.hadoop.security.UserGroupInformation: 
PriviledgedActionException as:${user@realm} (auth:KERBEROS) 
cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by 
GSSException: No valid credentials provided (Mechanism level: Server not found 
in Kerberos database (7) - LOOKING_UP_SERVER)]
WARN org.apache.hadoop.security.UserGroupInformation: 
PriviledgedActionException as:${user@realm} (auth:KERBEROS) 
cause:java.io.IOException: Couldn't setup connection for ${user@realm} to 
hbase/${ip}@realm
{code}
The main problem is  the trully server principal we generated in KDC is  
hbase/*${hostname}*@realm, so we must can't find  hbase/*${ip}*@realm in KDC.

When RpcClientImpl#Connection construct, the field serverPrincial which never 
changed generated by method InetAddress.getCanonicalHostName() which will 
return IP when failed to get hostname.

Therefor, once DNS jitter when RpcClientImpl#Connection, this connection will 
never setup sasl env. And I'm not see connection abandon logic in sasl failed 
code path.

I think of two solutions to this problem: 
 # Abandon connection when sasl failed. So next request will reconstruct a 
connection, and will regenerate a new server principal.
 # Refresh serverPrincial field when sasl failed. So next retry will use new 
server principal.

HBase Version: 1.2.0-cdh5.14.4

  was:
In our product hbase cluster, we occasionally encounter  errors

 


> DNS jitter causes hbase client to get stuck
> -------------------------------------------
>
>                 Key: HBASE-26022
>                 URL: https://issues.apache.org/jira/browse/HBASE-26022
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.2.0
>            Reporter: zhuobin zheng
>            Priority: Major
>
> In our product hbase cluster, we occasionally encounter below errors, and 
> stuck hbase a long time. Then hbase requests to this machine will fail 
> forever.
> {code:java}
> WARN org.apache.hadoop.security.UserGroupInformation: 
> PriviledgedActionException as:${user@realm} (auth:KERBEROS) 
> cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Server not 
> found in Kerberos database (7) - LOOKING_UP_SERVER)]
> WARN org.apache.hadoop.security.UserGroupInformation: 
> PriviledgedActionException as:${user@realm} (auth:KERBEROS) 
> cause:java.io.IOException: Couldn't setup connection for ${user@realm} to 
> hbase/${ip}@realm
> {code}
> The main problem is  the trully server principal we generated in KDC is  
> hbase/*${hostname}*@realm, so we must can't find  hbase/*${ip}*@realm in KDC.
> When RpcClientImpl#Connection construct, the field serverPrincial which never 
> changed generated by method InetAddress.getCanonicalHostName() which will 
> return IP when failed to get hostname.
> Therefor, once DNS jitter when RpcClientImpl#Connection, this connection will 
> never setup sasl env. And I'm not see connection abandon logic in sasl failed 
> code path.
> I think of two solutions to this problem: 
>  # Abandon connection when sasl failed. So next request will reconstruct a 
> connection, and will regenerate a new server principal.
>  # Refresh serverPrincial field when sasl failed. So next retry will use new 
> server principal.
> HBase Version: 1.2.0-cdh5.14.4



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to