[
https://issues.apache.org/jira/browse/HDFS-16165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17398208#comment-17398208
]
Daniel Osvath edited comment on HDFS-16165 at 8/12/21, 5:57 PM:
----------------------------------------------------------------
This request is on behalf [Confluent, Inc|http://confluent.io].
was (Author: dosvath):
This request is on behalf [Confluent, Inc|confluent.io].
> Backport the Hadoop 3.x Kerberos synchronization fix to Hadoop 2.x
> ------------------------------------------------------------------
>
> Key: HDFS-16165
> URL: https://issues.apache.org/jira/browse/HDFS-16165
> Project: Hadoop HDFS
> Issue Type: Wish
> Environment: Can be reproduced in docker HDFS environment with
> Kerberos
> https://github.com/vdesabou/kafka-docker-playground/blob/93a93de293ad2f9bb22afb244f2d8729a178296e/connect/connect-hdfs2-sink/hdfs2-sink-ha-kerberos-repro-gss-exception.sh
> Reporter: Daniel Osvath
> Priority: Major
>
> *Problem Description*
> For more than a year Apache Kafka Connect users have been running into a
> Kerberos renewal issue that causes our HDFS2 connectors to fail.
> We have been able to consistently reproduce the issue under high load with 40
> connectors (threads) that use the library. When we try an alternate
> workaround that uses the kerberos keytab on the system the connector operates
> without issues.
> We identified the root cause to be a race condition bug in the Hadoop 2.x
> library that causes the ticker renewal to fail with the error below:
> {code:java}
> Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by
> GSSException: No valid credentials provided (Mechanism level: Failed to find
> any Kerberos tgt)]
> at
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)We
> reached the conclusion of the root cause once we tried the same environment
> (40 connectors) with Hadoop 3.x, and our HDFS3 connectors and operated
> without renewal issues. Additionally, identifying that the synchronization
> issue has been fixed for the newer Hadoop 3.x releases we confirmed our
> hypothesis about the root cause. Request
> {code}
> There are many changes in HDFS 3
> [UserGroupInformation.java|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java]
> related to UGI synchronization which were done as part of
> https://issues.apache.org/jira/browse/HADOOP-9747, and those changes suggest
> some race conditions were happening with older version, i.e HDFS 2.x Which
> would explain why we can reproduce the problem with HDFS2.
> For example(among others):
> {code:java}
> private void relogin(HadoopLoginContext login, boolean ignoreLastLoginTime)
> throws IOException {
> // ensure the relogin is atomic to avoid leaving credentials in an
> // inconsistent state. prevents other ugi instances, SASL, and SPNEGO
> // from accessing or altering credentials during the relogin.
> synchronized(login.getSubjectLock()) {
> // another racing thread may have beat us to the relogin.
> if (login == getLogin()) {
> unprotectedRelogin(login, ignoreLastLoginTime);
> }
> }
> }
> {code}
> All those changes were not backported to Hadoop 2.x (out HDFS2 connector uses
> 2.10.1), on which several CDH distributions are based.
> *Request*
> We would like to ask for the synchronization fix to be backported to Hadoop
> 2.x so that our users can operate without issues.
> *Impact*
> The older 2.x Hadoop version is used by our HDFS connector, which is used in
> production by our community. Currently, the issue causes our HDFS connector
> to fail, as it is unable to recover and renew the ticket at a later point.
> Having the backported fix would allow our users to operate without issues
> that require manual intervention every week (or few days in some cases). The
> only workaround available to community for the issue is to run a command or
> restart their workers.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]