[
https://issues.apache.org/jira/browse/TAJO-858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026132#comment-14026132
]
Prafulla T commented on TAJO-858:
---------------------------------
Hi Min,
Thanks for your comment.
Here is my response.
For (1), The way I intend this to work is as follows.
User will store following information in tajo-site.xml
A. kerberos principal they can log in to hadoop namenode in
dfs.namenode.kerberos.principal parameter
B. keytab for this principal in tajo.dfs.namenode.keytab.file.
Tajomaster will log in to hadoop using this information and request dfs
delegation token.
Now this token can be used to access hdfs by any user. Hadoop API that I am
using takes only token information but I need to confirm whether current os
user information is used/sent.
The API that I have written is as follows :
FileSystemUtil.getFileSystem(path, configuration).
It does not depend on any hadoop specific information. All hadoop
authentication related information is hidden inside this API. For new
filesystem/storage ( such as s3 ), we can just modify (internal of)
FileSystemUtil.getFileSystem to get correct instance of FileSystem and it
should work.
For (2), I am not yet familiar with kerberos in Yarn. I will test it and see if
token file found in work directory can be used. That will make it easy.
For (3), Token file path OR token itself will not be stored in tajo-site.xml OR
any file for that matter.
It will be stored in memory. I think for token expiry, it can be renewed at
TajoMaster and that should be enough. I will test this assumption.
For (4), yes, that's what I intended. User/keytab information will be used only
at tajomaster.
At workers and other nodes, we will just use token to access hdfs.
Let me know what you think about this.
> Support for hadoop kerberos authentication in Tajo
> --------------------------------------------------
>
> Key: TAJO-858
> URL: https://issues.apache.org/jira/browse/TAJO-858
> Project: Tajo
> Issue Type: Bug
> Affects Versions: 0.8.0, 0.9.0
> Reporter: Min Zhou
> Assignee: Prafulla T
> Attachments: TAJO-858.patch
>
>
> The hadoop cluster is configured to use kerberos as authentication mechanism.
> The exception is list below, seems when opening a hdfs file, tajo can't read
> the security related config items from core-site.xml. It still used SIMPLE
> authentication.
> {noformat}
> 2014-05-29 01:00:40,269 WARN security.UserGroupInformation
> (UserGroupInformation.java:doAs(1551)) - PriviledgedActionException as:mzhou
> (auth:SIMPLE) cause:org.apache.hadoop.security.AccessControlException: Client
> cannot authenticate via:[TOKEN, KERBEROS]
> 2014-05-29 01:00:40,270 WARNdomain ipc.Client (Client.java:run(669)) -
> Exception encountered while connecting to the server :
> org.apache.hadoop.security.AccessControlException: Client canhostnot
> authenticate via:[TOKEN, KERBEROS]
> 2014-05-29 01:00:40,270 WARN security.UserGroupInformation
> (UserGroupInformation.java:doAs(1551)) - PriviledgedActionException as:mzhou
> (auth:SIMPLE) cause:java.io.IOException:
> org.apache.hadoop.security.AccessControlException: Client cannot authenticate
> via:[TOKEN, KERBEROS]
> 2014-05-29 01:00:40,278 ERROR worker.Task (Task.java:run(393)) -
> java.io.IOException: Failed on local exception: java.io.IOException:
> org.apache.hadoop.security.AccessControlException: Client cannot authenticate
> via:[TOKEN, KERBEROS]; Host Details : local host is:
> "host4.grid.domain.com/172.20.1.34"; destination host is:
> "host1.grid.domain.com":9000;
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
> at org.apache.hadoop.ipc.Client.call(Client.java:1410)
> at org.apache.hadoop.ipc.Client.call(Client.java:1359)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> at com.sun.proxy.$Proxy9.getBlockLocations(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy9.getBlockLocations(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:206)
> at
> org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1131)
> at
> org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1121)
> at
> org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1111)
> at
> org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:272)
> at
> org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:239)
> at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:232)
> at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1279)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:296)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:292)
> at
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:292)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:765)
> at org.apache.tajo.storage.CSVFile$CSVScanner.init(CSVFile.java:303)
> at
> org.apache.tajo.engine.planner.physical.SeqScanExec.initScanner(SeqScanExec.java:197)
> at
> org.apache.tajo.engine.planner.physical.SeqScanExec.init(SeqScanExec.java:179)
> at
> org.apache.tajo.engine.planner.physical.UnaryPhysicalExec.init(UnaryPhysicalExec.java:52)
> at
> org.apache.tajo.engine.planner.physical.UnaryPhysicalExec.init(UnaryPhysicalExec.java:52)
> at
> org.apache.tajo.engine.planner.physical.HashShuffleFileWriteExec.init(HashShuffleFileWriteExec.java:81)
> at org.apache.tajo.worker.Task.run(Task.java:383)
> at org.apache.tajo.worker.TaskRunner$1.run(TaskRunner.java:391)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.IOException:
> org.apache.hadoop.security.AccessControlException: Client cannot authenticate
> via:[TOKEN, KERBEROS]
> at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:674)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at
> org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:637)
> at
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:721)
> at org.apache.hadoop.ipc.Client$Connection.access$2700(Client.java:367)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1458)
> at org.apache.hadoop.ipc.Client.call(Client.java:1377)
> ... 32 more
> Caused by: org.apache.hadoop.security.AccessControlException: Client cannot
> authenticate via:[TOKEN, KERBEROS]
> at
> org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:170)
> at
> org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:387)
> at
> org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:547)
> at org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:367)
> at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:713)
> at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:709)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:708)
> ... 35 more
> 2014-05-29 01:00:40,278 INFO worker.TaskAttemptContext
> (TaskAttemptContext.java:setState(110)) - Query status of
> ta_1401325188560_0001_000001_000000_00 is changed to TA_FAILED
> 2014-05-29 01:00:40,281 INFO worker.Task (Task.java:run(447)) - Task Counter
> - total:1, succeeded: 0, killed: 0, failed: 1
> 2014-05-29 01:00:40,282 INFO worker.TaskRunner (TaskRunner.java:run(332)) -
> Request GetTask:
> eb_1401325188560_0001_000001,container_1401325188560_0001_01_000001
> 2014-05-29 01:00:40,305 INFO worker.TaskRunner (TaskRunner.java:run(370)) -
> Accumulated Received Task: 2
> 2014-05-29 01:00:40,305 INFO worker.TaskRunner (TaskRunner.java:run(379)) -
> Initializing: ta_1401325188560_0001_000001_000000_01
> 2014-05-29 01:00:40,316 INFO worker.TaskAttemptContext
> (TaskAttemptContext.java:setState(110)) - Query status of
> ta_1401325188560_0001_000001_000000_01 is changed to TA_PENDING
> 2014-05-29 01:00:40,316 INFO worker.Task (Task.java:<init>(188)) -
> ==================================
> 2014-05-29 01:00:40,318 INFO worker.Task (Task.java:<init>(189)) - *
> Subquery ta_1401325188560_0001_000001_000000_01 is initialized
> 2014-05-29 01:00:40,318 INFO worker.Task (Task.java:<init>(190)) - *
> InterQuery: true, Use HASH_SHUFFLE shuffle
> 2014-05-29 01:00:40,318 INFO worker.Task (Task.java:<init>(193)) - *
> Fragments (num: 1)
> 2014-05-29 01:00:40,318 INFO worker.Task (Task.java:<init>(194)) - * Fetches
> (total:0) :
> 2014-05-29 01:00:40,318 INFO worker.Task (Task.java:<init>(198)) - * Local
> task dir:
> file:/grid/d/tmp/yarn/usercache/mzhou/appcache/application_1400096295333_0092/container_1400096295333_0092_01_000004/${LOCAL_DIRS}/q_1401325188560_0001/output/1/0_1
> 2014-05-29 01:00:40,318 INFO worker.Task (Task.java:<init>(203)) -
> ==================================
> 2014-05-29 01:00:40,319 INFO worker.TaskAttemptContext
> (TaskAttemptContext.java:setState(110)) - Query status of
> ta_1401325188560_0001_000001_000000_01 is changed to TA_RUNNING
> 2014-05-29 01:00:40,319 INFO planner.PhysicalPlannerImpl
> (PhysicalPlannerImpl.java:createInMemoryHashAggregation(901)) - The planner
> chooses [Hash Aggregation]
> 2014-05-29 01:00:40,325 WARN security.UserGroupInformation
> (UserGroupInformation.java:doAs(1551)) - PriviledgedActionException as:mzhou
> (auth:SIMPLE) cause:org.apache.hadoop.security.AccessControlException: Client
> cannot authenticate via:[TOKEN, KERBEROS]
> 2014-05-29 01:00:40,326 WARN ipc.Client (Client.java:run(669)) - Exception
> encountered while connecting to the server :
> org.apache.hadoop.security.AccessControlException: Client cannot authenticate
> via:[TOKEN, KERBEROS]
> 2014-05-29 01:00:40,326 WARN security.UserGroupInformation
> (UserGroupInformation.java:doAs(1551)) - PriviledgedActionException as:mzhou
> (auth:SIMPLE) cause:java.io.IOException:
> org.apache.hadoop.security.AccessControlException: Client cannot authenticate
> via:[TOKEN, KERBEROS]
> 2014-05-29 01:00:40,328 ERROR worker.Task (Task.java:run(393)) -
> java.io.IOException: Failed on local exception: java.io.IOException:
> org.apache.hadoop.security.AccessControlException: Client cannot authenticate
> via:[TOKEN, KERBEROS]; Host Details : local host is:
> "host4.grid.domain.com/172.20.1.34"; destination host is:
> "host1.grid.domain.com":9000;
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
> at org.apache.hadoop.ipc.Client.call(Client.java:1410)
> at org.apache.hadoop.ipc.Client.call(Client.java:1359)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> at com.sun.proxy.$Proxy9.getBlockLocations(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy9.getBlockLocations(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:206)
> at
> org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1131)
> at
> org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1121)
> at
> org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1111)
> at
> org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:272)
> at
> org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:239)
> at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:232)
> at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1279)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:296)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:292)
> at
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:292)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:765)
> at org.apache.tajo.storage.CSVFile$CSVScanner.init(CSVFile.java:303)
> at
> org.apache.tajo.engine.planner.physical.SeqScanExec.initScanner(SeqScanExec.java:197)
> at
> org.apache.tajo.engine.planner.physical.SeqScanExec.init(SeqScanExec.java:179)
> at
> org.apache.tajo.engine.planner.physical.UnaryPhysicalExec.init(UnaryPhysicalExec.java:52)
> at
> org.apache.tajo.engine.planner.physical.UnaryPhysicalExec.init(UnaryPhysicalExec.java:52)
> at
> org.apache.tajo.engine.planner.physical.HashShuffleFileWriteExec.init(HashShuffleFileWriteExec.java:81)
> at org.apache.tajo.worker.Task.run(Task.java:383)
> at org.apache.tajo.worker.TaskRunner$1.run(TaskRunner.java:391)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.IOException:
> org.apache.hadoop.security.AccessControlException: Client cannot authenticate
> via:[TOKEN, KERBEROS]
> at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:674)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at
> org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:637)
> at
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:721)
> at org.apache.hadoop.ipc.Client$Connection.access$2700(Client.java:367)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1458)
> at org.apache.hadoop.ipc.Client.call(Client.java:1377)
> ... 32 more
> Caused by: org.apache.hadoop.security.AccessControlException: Client cannot
> authenticate via:[TOKEN, KERBEROS]
> at
> org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:170)
> at
> org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:387)
> at
> org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:547)
> at org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:367)
> at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:713)
> at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:709)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:708)
> ... 35 more
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.2#6252)