izzyacademy commented on a change in pull request #16559:
URL: https://github.com/apache/flink/pull/16559#discussion_r685541764



##########
File path: 
flink-filesystems/flink-azure-fs-hadoop/src/main/resources/META-INF/services/org.apache.flink.core.fs.FileSystemFactory
##########
@@ -15,3 +15,5 @@
 
 org.apache.flink.fs.azurefs.AzureFSFactory
 org.apache.flink.fs.azurefs.SecureAzureFSFactory
+org.apache.flink.fs.azurefs.ABFSAzureFSFactory

Review comment:
       Later on, I would love for us to rename these factories to something 
that clarifies the filesystems they actually support.
   
   For now, it is ok. It would just be a minor improvement for clarity @AHeise 
should we make that a FLIP?
   
   - org.apache.flink.fs.azurefs.AzureFSFactory supports wasb scheme 
(Non-Secure Azure Blob Storage)
   - org.apache.flink.fs.azurefs.SecureAzureFSFactory supports wasbs scheme 
(Secure Azure Blob Storage)
   - org.apache.flink.fs.azurefs.ABFSAzureFSFactory supports abfs scheme 
(Non-Secure Azure Data Lake Store Gen 2)
   - org.apache.flink.fs.azurefs.SecureABFSAzureFSFactory supports abfss scheme 
(Secure Azure Data Lake Store Gen 2)
   
   We could rename them as follows:
   
   - org.apache.flink.fs.azurefs.AzureFSFactory -> 
org.apache.flink.fs.azurefs.AzureBlobStorageFSFactory
   - org.apache.flink.fs.azurefs.SecureAzureFSFactory -> 
org.apache.flink.fs.azurefs.SecureAzureBlobStorageFSFactory 
   - org.apache.flink.fs.azurefs.ABFSAzureFSFactory -> 
org.apache.flink.fs.azurefs.AzureDataLakeStoreGen2FSFactory
   - org.apache.flink.fs.azurefs.SecureABFSAzureFSFactory -> 
org.apache.flink.fs.azurefs.SecureAzureDataLakeStoreGen2FSFactory
   
   What do you think @srinipunuru @AHeise ?

##########
File path: docs/content/docs/deployment/filesystems/azure.md
##########
@@ -83,4 +103,19 @@ environment variable `AZURE_STORAGE_KEY` by setting the 
following configuration
 fs.azure.account.keyprovider.<account_name>.blob.core.windows.net: 
org.apache.flink.fs.azurefs.EnvironmentVariableKeyProvider
 ```
 
+### ABFS
+
+Hadoop's ABFS Azure Filesystem supports several ways of configuring 
authentication. Please visit the [Hadoop ABFS 
documentation](https://hadoop.apache.org/docs/stable/hadoop-azure/abfs.html#Authentication)
 documentation on how to configure.
+
+{{< hint info >}}
+Azure recommends using Azure managed identity to access the ADLS Gen2 storage 
accounts using abfs. Details on how to do this are beyond the scope of this 
documentation, please refer to the Azure documentation for more details.
+{{< /hint >}}

Review comment:
       I will like to add that if Managed identities are used, authentication 
can be done from the following environments towards ADLS Gen2 if the user is 
running Flink out of those environments:
   
   - Azure Kubernetes services
   - Azure Arc enabled Kubernetes
   - Azure Arc enabled servers
   - Azure Virtual Machines
   - Azure Virtual Machine Scale Sets
   
   Azure services that support managed identities for Azure resources
   
https://docs.microsoft.com/en-us/azure/active-directory/managed-identities-azure-resources/services-support-managed-identities#azure-services-that-support-managed-identities-for-azure-resources
   
   
   Azure services that support Azure AD authentication
   
https://docs.microsoft.com/en-us/azure/active-directory/managed-identities-azure-resources/services-support-managed-identities#azure-services-that-support-azure-ad-authentication
   

##########
File path: docs/content/docs/deployment/filesystems/overview.md
##########
@@ -56,7 +56,7 @@ The Apache Flink project supports the following file systems:
   - **[Aliyun Object Storage Service]({{< ref 
"docs/deployment/filesystems/oss" >}})** is supported by `flink-oss-fs-hadoop` 
and registered under the *oss://* URI scheme.
   The implementation is based on the [Hadoop 
Project](https://hadoop.apache.org/) but is self-contained with no dependency 
footprint.
 
-  - **[Azure Blob Storage]({{< ref "docs/deployment/filesystems/azure" >}})** 
is supported by `flink-azure-fs-hadoop` and registered under the *wasb(s)://* 
URI schemes.
+  - **[Azure Blob Storage]({{< ref "docs/deployment/filesystems/azure" >}})** 
is supported by `flink-azure-fs-hadoop` and registered under the *abfs(s)://* 
and *wasb(s)://* URI schemes.

Review comment:
       I think we need to distinguish between the two schemes (abfs and wasb) 
here. Some users may be looking for support for Azure Datalake Storage Gen2 and 
they might miss the documentation because it does not specifically call it out 
and it in listed under Blob Storage
   
   Azure Data Lake Store Gen2 [abfs(s)]
   
https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction
   
   
   Azure Blob Storage [wasb(s)]
   
https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to