Josh Elser created ACCUMULO-3497:
------------------------------------

             Summary: Poor error when bind-address of server doesn't match with 
kerberos principal
                 Key: ACCUMULO-3497
                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3497
             Project: Accumulo
          Issue Type: Improvement
          Components: rpc
            Reporter: Josh Elser
            Assignee: Josh Elser
             Fix For: 1.7.0


I used the generated configuration (in 
{{assemble/accumulo-$VERSION-dev/accumulo-$VERSION}}) and got errors in the 
master and tserver:

{panel:title=TServer}
{code}
java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: 
Peer indicated failure: GSS initiate failed
        at 
org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
        at 
org.apache.accumulo.core.rpc.UGIAssumingTransportFactory$1.run(UGIAssumingTransportFactory.java:51)
        at 
org.apache.accumulo.core.rpc.UGIAssumingTransportFactory$1.run(UGIAssumingTransportFactory.java:48)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:356)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1608)
        at 
org.apache.accumulo.core.rpc.UGIAssumingTransportFactory.getTransport(UGIAssumingTransportFactory.java:48)
        at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:208)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.thrift.transport.TTransportException: Peer indicated 
failure: GSS initiate failed
        at 
org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:190)
        at 
org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
        at 
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
        at 
org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
        at 
org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
        ... 10 more
{code}
{panel}

{panel:title=Master}
{code}
2015-01-19 17:07:55,505 [transport.TSaslTransport] ERROR: SASL negotiation 
failure
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: 
No valid credentials provided (Mechanism level: Server not found in Kerberos 
database (7) - LOOKING_UP_SERVER)]
        at 
com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212)
        at 
org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
        at 
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
        at 
org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
        at 
org.apache.accumulo.core.rpc.UGIAssumingTransport$1.run(UGIAssumingTransport.java:53)
        at 
org.apache.accumulo.core.rpc.UGIAssumingTransport$1.run(UGIAssumingTransport.java:49)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at 
org.apache.accumulo.core.rpc.UGIAssumingTransport.open(UGIAssumingTransport.java:49)
        at 
org.apache.accumulo.core.rpc.ThriftUtil.createClientTransport(ThriftUtil.java:358)
        at 
org.apache.accumulo.core.client.impl.ThriftTransportPool.createNewTransport(ThriftTransportPool.java:478)
        at 
org.apache.accumulo.core.client.impl.ThriftTransportPool.getTransport(ThriftTransportPool.java:411)
        at 
org.apache.accumulo.core.client.impl.ThriftTransportPool.getTransport(ThriftTransportPool.java:389)
        at 
org.apache.accumulo.core.rpc.ThriftUtil.getClient(ThriftUtil.java:122)
        at 
org.apache.accumulo.server.master.LiveTServerSet$TServerConnection.halt(LiveTServerSet.java:118)
        at 
org.apache.accumulo.master.Master.gatherTableInformation(Master.java:1009)
        at org.apache.accumulo.master.Master.access$600(Master.java:160)
        at 
org.apache.accumulo.master.Master$StatusThread.updateStatus(Master.java:911)
        at org.apache.accumulo.master.Master$StatusThread.run(Master.java:901)
Caused by: GSSException: No valid credentials provided (Mechanism level: Server 
not found in Kerberos database (7) - LOOKING_UP_SERVER)
        at 
sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:710)
        at 
sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:248)
        at 
sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
        at 
com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:193)
        ... 19 more
Caused by: KrbException: Server not found in Kerberos database (7) - 
LOOKING_UP_SERVER
        at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:73)
        at sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:192)
        at sun.security.krb5.KrbTgsReq.sendAndGetCreds(KrbTgsReq.java:203)
        at 
sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:309)
        at 
sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:115)
        at 
sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:454)
        at 
sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:641)
        ... 22 more
{code}
{panel}

This error occurs due to fact that DNS is so closely tied to the 
authentication. The default configuration used {{localhost}} instead of the 
FQDN in hosts files (masters, monitors, slaves, tracers, gc). This ultimately 
created a mismatch between the instance component of the kerberos principal (I 
used the FQDN) while the thrift server using the FQDN.

We should detect when this happens and throw an intuitive error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to