[
https://issues.apache.org/jira/browse/HDFS-10799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Work on HDFS-10799 started by Wei-Chiu Chuang.
----------------------------------------------
> NameNode should use loginUser(hdfs) to serve iNotify requests
> -------------------------------------------------------------
>
> Key: HDFS-10799
> URL: https://issues.apache.org/jira/browse/HDFS-10799
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 2.6.0
> Environment: Kerberized, HA cluster, iNotify client, CDH5.7.0
> Reporter: Wei-Chiu Chuang
> Assignee: Wei-Chiu Chuang
> Attachments: HDFS-10799.001.patch
>
>
> When a NameNode serves iNotify requests from a client, it verifies the client
> has superuser permission and then uses the client's Kerberos principal to
> read edits from journal nodes.
> However, if the client does not renew its tgt tickets, the connection from
> NameNode to journal nodes may fail. In which case, the NameNode thinks the
> edits are corrupt, and prints a scary error message:
> "During automatic edit log failover, we noticed that all of the remaining
> edit log streams are shorter than the current one! The best remaining edit
> log ends at transaction 11577603, but we thought we could read up to
> transaction 11577606. If you continue, metadata will be lost forever!"
> However, the edits are actually good. NameNode _should not freak out when an
> iNotify client's tgt ticket expires_.
> I think that an easy solution to this bug, is that after NameNode verifies
> client has superuser permission, call {{SecurityUtil.doAsLoginUser}} and then
> read edits. This will make sure the operation does not fail due to an expired
> client ticket.
> Excerpt of related logs:
> {noformat}
> 2016-08-18 19:05:13,979 WARN org.apache.hadoop.security.UserGroupInformation:
> PriviledgedActionException as:[email protected] (auth:KERBEROS)
> cause:java.io.IOException: We encountered an error reading
> http://jn1.example.com:8480/getJournal?jid=nameservice1&segmentTxId=11577487&storageInfo=yyy,
>
> http://jn1.example.com:8480/getJournal?jid=nameservice1&segmentTxId=11577487&storageInfo=yyy.
> During automatic edit log failover, we noticed that all of the remaining
> edit log streams are shorter than the current one! The best remaining edit
> log ends at transaction 11577603, but we thought we could read up to
> transaction 11577606. If you continue, metadata will be lost forever!
> 2016-08-18 19:05:13,979 INFO org.apache.hadoop.ipc.Server: IPC Server handler
> 112 on 8020, call
> org.apache.hadoop.hdfs.protocol.ClientProtocol.getEditsFromTxid from [client
> IP:port] Call#73 Retry#0
> java.io.IOException: We encountered an error reading
> http://jn1.example.com:8480/getJournal?jid=nameservice1&segmentTxId=11577487&storageInfo=yyy,
>
> http://jn1.example.com:8480/getJournal?jid=nameservice1&segmentTxId=11577487&storageInfo=yyy.
> During automatic edit log failover, we noticed that all of the remaining
> edit log streams are shorter than the current one! The best remaining edit
> log ends at transaction 11577603, but we thought we could read up to
> transaction 11577606. If you continue, metadata will be lost forever!
> at
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:213)
> at
> org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.readOp(NameNodeRpcServer.java:1674)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getEditsFromTxid(NameNodeRpcServer.java:1736)
> at
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getEditsFromTxid(AuthorizationProviderProxyClientProtocol.java:1010)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getEditsFromTxid(ClientNamenodeProtocolServerSideTranslatorPB.java:1475)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]