[GitHub] [flink] AHeise commented on a change in pull request #16559: [FLINK-18562] Support for Hadoop ABFS for Azure Datalake Gen2 accounts

GitBox Thu, 05 Aug 2021 04:23:08 -0700


AHeise commented on a change in pull request #16559:
URL: https://github.com/apache/flink/pull/16559#discussion_r682825092




##########
File path: 
flink-filesystems/flink-s3-fs-presto/src/main/resources/META-INF/NOTICE
##########
@@ -34,14 +34,25 @@ This project bundles the following dependencies under the 
Apache Software Licens
 - joda-time:joda-time:2.5
 - org.apache.commons:commons-configuration2:2.1.1
 - org.apache.commons:commons-lang3:3.3.2
-- org.apache.hadoop:hadoop-annotations:3.1.0
-- org.apache.hadoop:hadoop-aws:3.1.0
-- org.apache.hadoop:hadoop-auth:3.1.0
-- org.apache.hadoop:hadoop-common:3.1.0
+- org.apache.commons:commons-text:1.4
+- org.apache.hadoop:hadoop-annotations:3.3.1
+- org.apache.hadoop:hadoop-aws:3.3.1
+- org.apache.hadoop:hadoop-auth:3.3.1
+- org.apache.hadoop:hadoop-common:3.3.1
+- org.apache.hadoop.thirdparty:hadoop-shaded-guava:1.1.1
+- org.apache.hadoop.thirdparty:hadoop-shaded-protobuf_3_7:1.1.1
 - org.apache.htrace:htrace-core4:4.1.0-incubating
-- org.apache.httpcomponents:httpcore:4.4.14
 - org.apache.httpcomponents:httpclient:4.5.13
+- org.apache.httpcomponents:httpcore:4.4.14
+- org.apache.kerby:kerby-asn1:1.0.1
+- org.apache.kerby:kerb-core:1.0.1
+- org.apache.kerby:kerby-pkix:1.0.1
+- org.apache.kerby:kerby-util:1.0.1
+- org.codehaus.woodstox:stax2-api:4.2.1
+- org.xerial.snappy:snappy-java:1.1.8.3
 - org.weakref:jmxutils:1.19
+- org.wildfly.openssl:wildfly-openssl:1.0.7.Final
+- dnsjava:dnsjava:2.1.7

Review comment:
       So `dnsjava` is definitively declared incorrectly then. Thanks for 
checking!

##########
File path: flink-core/src/main/java/org/apache/flink/core/fs/FileSystem.java
##########
@@ -248,6 +248,8 @@
             ImmutableMultimap.<String, String>builder()
                     .put("wasb", "flink-fs-azure-hadoop")
                     .put("wasbs", "flink-fs-azure-hadoop")
+                    .put("abfs", "flink-fs-azure-hadoop")
+                    .put("abfss", "flink-fs-azure-hadoop")

Review comment:
       Just to clarify: Adding the entries will not change anything if 
everything is setup correctly. However, in case the user forgets to add the fs 
(as plugin or lib), this list can give a meaningful error message. So thanks 
for updating it!

##########
File path: 
flink-filesystems/flink-s3-fs-hadoop/src/main/java/org/apache/flink/fs/s3hadoop/HadoopS3AccessHelper.java
##########
@@ -52,8 +55,22 @@
     private final InternalWriteOperationHelper s3accessHelper;
 
     public HadoopS3AccessHelper(S3AFileSystem s3a, Configuration conf) {

Review comment:
       Yes, I think it would be best to have 2 commits ultimately.
   In the first one, you are bumping the hadoop version and adjust this class + 
license/notice files. In the second commit, your actual changes come in.

##########
File path: 
flink-filesystems/flink-azure-fs-hadoop/src/main/java/org/apache/flink/fs/azurefs/AbstractAzureFSFactory.java
##########
@@ -75,18 +76,27 @@ public void configure(Configuration config) {
     @Override
     public FileSystem create(URI fsUri) throws IOException {
         checkNotNull(fsUri, "passed file system URI object should not be 
null");
-        LOG.info("Trying to load and instantiate Azure File System");
+        LOG.info("Trying to load and instantiate Azure File System for {}", 
fsUri);
         return new HadoopFileSystem(createInitializedAzureFS(fsUri, 
flinkConfig));
     }
 
-    // uri is of the form: 
wasb(s)://[email protected]/testDir
+    // uri is of the form: 
wasb(s)://[email protected]/testDir (or)
+    // abfs(s):////[email protected]/testDir
     private org.apache.hadoop.fs.FileSystem createInitializedAzureFS(
             URI fsUri, Configuration flinkConfig) throws IOException {
         org.apache.hadoop.conf.Configuration hadoopConfig = 
configLoader.getOrLoadHadoopConfig();
-
-        org.apache.hadoop.fs.FileSystem azureFS = new NativeAzureFileSystem();
-        azureFS.initialize(fsUri, hadoopConfig);
-
-        return azureFS;
+        String scheme = fsUri.getScheme();
+
+        if (scheme.startsWith("wasb")) {

Review comment:
       From code structure, it now looks much cleaner. We could go one step 
further and pull the `initialize` back into this method. Then all 
`createAzureFS` are really just one liners. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] AHeise commented on a change in pull request #16559: [FLINK-18562] Support for Hadoop ABFS for Azure Datalake Gen2 accounts

Reply via email to