izzyacademy commented on a change in pull request #16559:
URL: https://github.com/apache/flink/pull/16559#discussion_r685541764
##########
File path:
flink-filesystems/flink-azure-fs-hadoop/src/main/resources/META-INF/services/org.apache.flink.core.fs.FileSystemFactory
##########
@@ -15,3 +15,5 @@
org.apache.flink.fs.azurefs.AzureFSFactory
org.apache.flink.fs.azurefs.SecureAzureFSFactory
+org.apache.flink.fs.azurefs.ABFSAzureFSFactory
Review comment:
Later on, I would love for us to rename these factories to something
that clarifies the filesystems they actually support.
For now, it is ok. It would just be a minor improvement for clarity @AHeise
should we make that a FLIP?
- org.apache.flink.fs.azurefs.AzureFSFactory supports wasb scheme
(Non-Secure Azure Blob Storage)
- org.apache.flink.fs.azurefs.SecureAzureFSFactory supports wasbs scheme
(Secure Azure Blob Storage)
- org.apache.flink.fs.azurefs.ABFSAzureFSFactory supports abfs scheme
(Non-Secure Azure Data Lake Store Gen 2)
- org.apache.flink.fs.azurefs.SecureABFSAzureFSFactory supports abfss scheme
(Secure Azure Data Lake Store Gen 2)
We could rename them as follows:
- org.apache.flink.fs.azurefs.AzureFSFactory ->
org.apache.flink.fs.azurefs.AzureBlobStorageFSFactory
- org.apache.flink.fs.azurefs.SecureAzureFSFactory ->
org.apache.flink.fs.azurefs.SecureAzureBlobStorageFSFactory
- org.apache.flink.fs.azurefs.ABFSAzureFSFactory ->
org.apache.flink.fs.azurefs.AzureDataLakeStoreGen2FSFactory
- org.apache.flink.fs.azurefs.SecureABFSAzureFSFactory ->
org.apache.flink.fs.azurefs.SecureAzureDataLakeStoreGen2FSFactory
What do you think @srinipunuru @AHeise ?
##########
File path: docs/content/docs/deployment/filesystems/azure.md
##########
@@ -83,4 +103,19 @@ environment variable `AZURE_STORAGE_KEY` by setting the
following configuration
fs.azure.account.keyprovider.<account_name>.blob.core.windows.net:
org.apache.flink.fs.azurefs.EnvironmentVariableKeyProvider
```
+### ABFS
+
+Hadoop's ABFS Azure Filesystem supports several ways of configuring
authentication. Please visit the [Hadoop ABFS
documentation](https://hadoop.apache.org/docs/stable/hadoop-azure/abfs.html#Authentication)
documentation on how to configure.
+
+{{< hint info >}}
+Azure recommends using Azure managed identity to access the ADLS Gen2 storage
accounts using abfs. Details on how to do this are beyond the scope of this
documentation, please refer to the Azure documentation for more details.
+{{< /hint >}}
Review comment:
I will like to add that if Managed identities are used, authentication
can be done from the following environments towards ADLS Gen2 if the user is
running Flink out of those environments:
- Azure Kubernetes services
- Azure Arc enabled Kubernetes
- Azure Arc enabled servers
- Azure Virtual Machines
- Azure Virtual Machine Scale Sets
Azure services that support managed identities for Azure resources
https://docs.microsoft.com/en-us/azure/active-directory/managed-identities-azure-resources/services-support-managed-identities#azure-services-that-support-managed-identities-for-azure-resources
Azure services that support Azure AD authentication
https://docs.microsoft.com/en-us/azure/active-directory/managed-identities-azure-resources/services-support-managed-identities#azure-services-that-support-azure-ad-authentication
##########
File path: docs/content/docs/deployment/filesystems/overview.md
##########
@@ -56,7 +56,7 @@ The Apache Flink project supports the following file systems:
- **[Aliyun Object Storage Service]({{< ref
"docs/deployment/filesystems/oss" >}})** is supported by `flink-oss-fs-hadoop`
and registered under the *oss://* URI scheme.
The implementation is based on the [Hadoop
Project](https://hadoop.apache.org/) but is self-contained with no dependency
footprint.
- - **[Azure Blob Storage]({{< ref "docs/deployment/filesystems/azure" >}})**
is supported by `flink-azure-fs-hadoop` and registered under the *wasb(s)://*
URI schemes.
+ - **[Azure Blob Storage]({{< ref "docs/deployment/filesystems/azure" >}})**
is supported by `flink-azure-fs-hadoop` and registered under the *abfs(s)://*
and *wasb(s)://* URI schemes.
Review comment:
I think we need to distinguish between the two schemes (abfs and wasb)
here. Some users may be looking for support for Azure Datalake Storage Gen2 and
they might miss the documentation because it does not specifically call it out
and it in listed under Blob Storage
Azure Data Lake Store Gen2 [abfs(s)]
https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction
Azure Blob Storage [wasb(s)]
https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]