Dheeraj Panangat created HUDI-5103:
--------------------------------------
Summary: Does not work with Azure Data lake Gen2
Key: HUDI-5103
URL: https://issues.apache.org/jira/browse/HUDI-5103
Project: Apache Hudi
Issue Type: Bug
Reporter: Dheeraj Panangat
Unable to use Hudi with Flink for Azure Data Lake
It tries to look for
{code:java}
Caused by: Configuration property <datalakeaccount>.dfs.core.windows.net not
found.
at
org.apache.hadoop.fs.azurebfs.AbfsConfiguration.getStorageAccountKey(AbfsConfiguration.java:372)
at
org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.initializeClient(AzureBlobFileSystemStore.java:1133)
at
org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.<init>(AzureBlobFileSystemStore.java:174)
{code}
Following properties are specified when running the Flink code :
{code:java}
"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type":
"org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": "<appId>",
"fs.azure.account.oauth2.client.secret": "<clientSecret>",
"fs.azure.account.oauth2.client.endpoint":
"https://login.microsoftonline.com/<tenant>/oauth2/token",
"fs.azure.createRemoteFileSystemDuringInitialization": "true" {code}
as per document : [Microsoft Azure Spark to Data Lake
Gen2|https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-use-databricks-spark]
In AWS it works because it takes the credentials from the environment, but in
Azure it needs to get it from config, which does not reach till the point where
the FileSystem is initialized.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)