[ 
https://issues.apache.org/jira/browse/HDFS-11908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Clampffer updated HDFS-11908:
-----------------------------------
    Status: Patch Available  (was: Open)

> libhdfs++: Authentication failure when first NN of kerberized HA cluster is 
> standby
> -----------------------------------------------------------------------------------
>
>                 Key: HDFS-11908
>                 URL: https://issues.apache.org/jira/browse/HDFS-11908
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>            Reporter: James Clampffer
>            Assignee: James Clampffer
>         Attachments: HDFS-11908.HDFS-8707.000.patch
>
>
> Library won't properly authenticate to kerberized HA cluster if the first 
> namenode it tries to connect to is the standby.  RpcConnection ends up 
> attempting to use simple auth.
> Control flow to connect to NN for the first time:
> # RpcConnection constructed with a pointer to the RpcEngine as the only 
> argument
> # RpcConnection::Connect(server endpoints, auth_info, callback called)
> ** auth_info contains the SASL mechanism to use + the delegation token if we 
> already have one
> Control flow to connect to NN after failover:
> # RpcEngine::NewConnection called, allocates an RpcConnection exactly how 
> step 1 above would
> # RpcEngine::InitializeConnection called, sets event hooks and a string for 
> cluster name
> # Rpc calls sent using RpcConnection::PreEnqueueRequests called to add RPC 
> message that didn't make it on last call due to standby exception
> # RpcConnection::ConnectAndFlush called to send RPC packets. This only takes 
> server endpoints, no auth info
> To fix:
> RpcEngine::InitializeConnection just needs to set RpcConnection::auth_info_ 
> from the existing RpcEngine::auth_info_, even better would be setting this in 
> the constructor so if an RpcConnection exists it can be expected to be in a 
> usable state.  I'll get a diff up once I sort out CI build failures.
> Also really need to get CI test coverage for HA and kerberos because this 
> issue should not have been around for so long.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to