Re: [PR] HADOOP-18971: [ABFS] Enable Footer Read Optimizations with Appropriate Footer Read Buffer Size [hadoop]

via GitHub Tue, 14 Nov 2023 03:40:26 -0800


steveloughran commented on code in PR #6270:
URL: https://github.com/apache/hadoop/pull/6270#discussion_r1392446821



##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/ConfigurationKeys.java:
##########
@@ -104,7 +104,20 @@ public final class ConfigurationKeys {
   public static final String AZURE_ENABLE_SMALL_WRITE_OPTIMIZATION = 
"fs.azure.write.enableappendwithflush";
   public static final String AZURE_READ_BUFFER_SIZE = 
"fs.azure.read.request.size";
   public static final String AZURE_READ_SMALL_FILES_COMPLETELY = 
"fs.azure.read.smallfilescompletely";
+  /**
+   * When parquet files are read, first few read are metadata reads before 
reading the actual data.

Review Comment:
   this is roughly the same for ORC, isn't it?



##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/ConfigurationKeys.java:
##########
@@ -104,7 +104,20 @@ public final class ConfigurationKeys {
   public static final String AZURE_ENABLE_SMALL_WRITE_OPTIMIZATION = 
"fs.azure.write.enableappendwithflush";
   public static final String AZURE_READ_BUFFER_SIZE = 
"fs.azure.read.request.size";
   public static final String AZURE_READ_SMALL_FILES_COMPLETELY = 
"fs.azure.read.smallfilescompletely";
+  /**
+   * When parquet files are read, first few read are metadata reads before 
reading the actual data.
+   * First the read is done of last 8 bytes of parquet file to get the postion 
of metadta and next read
+   * is done for reading that metadata. With this optimizations these two 
reads can be combined into 1.
+   * Value: {@value}
+   */
   public static final String AZURE_READ_OPTIMIZE_FOOTER_READ = 
"fs.azure.read.optimizefooterread";
+  /**
+   * In case of footer reads it was not required to read full buffer size.
+   * Most of the metadata information required was within 256KB and it will be 
more performant to read lesser.

Review Comment:
   "read less"



##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/ConfigurationKeys.java:
##########
@@ -104,7 +104,20 @@ public final class ConfigurationKeys {
   public static final String AZURE_ENABLE_SMALL_WRITE_OPTIMIZATION = 
"fs.azure.write.enableappendwithflush";
   public static final String AZURE_READ_BUFFER_SIZE = 
"fs.azure.read.request.size";
   public static final String AZURE_READ_SMALL_FILES_COMPLETELY = 
"fs.azure.read.smallfilescompletely";
+  /**
+   * When parquet files are read, first few read are metadata reads before 
reading the actual data.
+   * First the read is done of last 8 bytes of parquet file to get the postion 
of metadta and next read
+   * is done for reading that metadata. With this optimizations these two 
reads can be combined into 1.

Review Comment:
   nit "optimization"



##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/FileSystemConfigurations.java:
##########
@@ -59,7 +59,8 @@ public final class FileSystemConfigurations {
   public static final boolean DEFAULT_AZURE_ENABLE_SMALL_WRITE_OPTIMIZATION = 
false;
   public static final int DEFAULT_READ_BUFFER_SIZE = 4 * ONE_MB;  // 4 MB
   public static final boolean DEFAULT_READ_SMALL_FILES_COMPLETELY = false;
-  public static final boolean DEFAULT_OPTIMIZE_FOOTER_READ = false;
+  public static final boolean DEFAULT_OPTIMIZE_FOOTER_READ = true;
+  public static final int DEFAULT_FOOTER_READ_BUFFER_SIZE = 512 * ONE_KB;

Review Comment:
   this is 512k; docs in file above say 265K. 



##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/ConfigurationKeys.java:
##########
@@ -104,7 +104,20 @@ public final class ConfigurationKeys {
   public static final String AZURE_ENABLE_SMALL_WRITE_OPTIMIZATION = 
"fs.azure.write.enableappendwithflush";
   public static final String AZURE_READ_BUFFER_SIZE = 
"fs.azure.read.request.size";
   public static final String AZURE_READ_SMALL_FILES_COMPLETELY = 
"fs.azure.read.smallfilescompletely";
+  /**
+   * When parquet files are read, first few read are metadata reads before 
reading the actual data.
+   * First the read is done of last 8 bytes of parquet file to get the postion 
of metadta and next read
+   * is done for reading that metadata. With this optimizations these two 
reads can be combined into 1.
+   * Value: {@value}
+   */
   public static final String AZURE_READ_OPTIMIZE_FOOTER_READ = 
"fs.azure.read.optimizefooterread";

Review Comment:
   i'd prefer "." between words, but it's too late now



##########
hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/ITestAbfsInputStreamReadFooter.java:
##########
@@ -190,7 +193,8 @@ private void seekReadAndTest(final FileSystem fs, final 
Path testFilePath,
     try (FSDataInputStream iStream = fs.open(testFilePath)) {
       AbfsInputStream abfsInputStream = (AbfsInputStream) iStream
           .getWrappedStream();
-      long bufferSize = abfsInputStream.getBufferSize();
+      long footerReadBufferSize = abfsInputStream.getFooterReadBufferSize();

Review Comment:
   I'd propose something else, will comment below



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Re: [PR] HADOOP-18971: [ABFS] Enable Footer Read Optimizations with Appropriate Footer Read Buffer Size [hadoop]

Reply via email to