[ https://issues.apache.org/jira/browse/HADOOP-17347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17261211#comment-17261211 ]
Steve Loughran commented on HADOOP-17347: ----------------------------------------- I'm assuming its driven a bit by the read sequence of a parquet file which is * tail -8 file to check for magic and offset of real footer * seek to real footer and read > ABFS: Optimise read for small files/tails of files > -------------------------------------------------- > > Key: HADOOP-17347 > URL: https://issues.apache.org/jira/browse/HADOOP-17347 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure > Affects Versions: 3.4.0 > Reporter: Bilahari T H > Assignee: Bilahari T H > Priority: Major > Labels: pull-request-available > Time Spent: 12h 10m > Remaining Estimate: 0h > > Optimize read performance for the following scenarios > # Read small files completely > Files that are of size smaller than the read buffer size can be considered > as small files. In case of such files it would be better to read the full > file into the AbfsInputStream buffer. > # Read last block if the read is for footer > If the read is for the last 8 bytes, read the full file. > This will optimize reads for parquet files. [Parquet file > format|https://www.ellicium.com/parquet-file-format-structure/] > Both these optimizations will be present under configs as follows > # fs.azure.read.smallfilescompletely > # fs.azure.read.optimizefooterread -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org