Emeric created HADOOP-19247:
-------------------------------
Summary: Authentification failed in Azure Kubernetes with HTTP1.1
and Chunked transfer encoding
Key: HADOOP-19247
URL: https://issues.apache.org/jira/browse/HADOOP-19247
Project: Hadoop Common
Issue Type: Bug
Components: auth
Affects Versions: 3.3.6, 3.3.4, 3.4.0, 3.5.0
Environment: Azure Kubernetes Services
Azure Entra ID
Azure Metadata Service
Spark 3.3
Reporter: Emeric
Attachments: CodeResponse.png, TokenKO.png, TokenOK.png
The problem is related to Azure authentication on Kubernetes.
When I run my Spark program, I have this error when I try to authenticate the
pod :
{code:java}
java.lang.NullPointerException
at
org.apache.hadoop.fs.azurebfs.oauth2.AzureADAuthenticator.consumeInputStream(AzureADAuthenticator.java:340)
at
org.apache.hadoop.fs.azurebfs.oauth2.AzureADAuthenticator.getTokenSingleCall(AzureADAuthenticator.java:270)
at
org.apache.hadoop.fs.azurebfs.oauth2.AzureADAuthenticator.getTokenCall(AzureADAuthenticator.java:211)
at
org.apache.hadoop.fs.azurebfs.oauth2.AzureADAuthenticator.getTokenFromMsi(AzureADAuthenticator.java:137)
at
org.apache.hadoop.fs.azurebfs.oauth2.MsiTokenProvider.refreshToken(MsiTokenProvider.java:45)
at
org.apache.hadoop.fs.azurebfs.oauth2.AccessTokenProvider.getToken(AccessTokenProvider.java:50)
at
org.apache.hadoop.fs.azurebfs.services.AbfsClient.getAccessToken(AbfsClient.java:554)
at
org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.executeHttpOperation(AbfsRestOperation.java:151)
at
org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.execute(AbfsRestOperation.java:125)
at
org.apache.hadoop.fs.azurebfs.services.AbfsClient.listPath(AbfsClient.java:181)
at
org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.listStatus(AzureBlobFileSystemStore.java:569)
at
org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.listStatus(AzureBlobFileSystemStore.java:536)
at
org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.listStatus(AzureBlobFileSystem.java:359)
{code}
My configuration is a spark-driver deployed on Azure kubernetes with managed
identity.
I used [this
method|https://medium.com/datamindedbe/running-spark-3-on-aks-with-azure-ad-integration-c1fc0032c550]
with aad-pod-identity.
There are two different scenarios we can observe when trying to authenticate on
Kubernetes to Azure Instance Metadata Service :
* The returned token is short and its size is less than 2048 chars. The Token
have all headers and explicitly the "Content-Length" header
!TokenOK.png!
* The returned token is long and its size is more than 2048 chars. The Token
have the HTTP1.1 capacity with transfer encoding property in Response and don't
have the "Content-length" header due to Chunked transfer encoding mechanism.
!TokenKO.png!
NB : I run a curl command in pod to generate these sceenshots according to the
[Azure
Documentation|https://learn.microsoft.com/en-us/azure/virtual-machines/instance-metadata-service?tabs=linux]
In a GitHub repository I found my "AzureADAuthenticator.java" and this piece of
code :
!CodeResponse.png!
The "Content-length" property is mandatory when the returned HTTP code is 200
and it's not compatible with the HTTP1.1 Chunked transfer encoding
fonctionality.
Is it possible to update this authentification to support this mechanism
implemented by Microsoft on kubernetes (and may be in virtual machine).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]