[ https://issues.apache.org/jira/browse/HADOOP-17347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17257885#comment-17257885 ]
Rajesh Balamohan commented on HADOOP-17347: ------------------------------------------- >> If the read is for the last 8 bytes, read the full file. Can you plz share details on this? Does this mean that it is going to load 4 MB (or buffer size) worth of data during footer reads? If so, it would be expensive for short jobs that rely on footer reads. > ABFS: Read optimizations > ------------------------ > > Key: HADOOP-17347 > URL: https://issues.apache.org/jira/browse/HADOOP-17347 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure > Affects Versions: 3.4.0 > Reporter: Bilahari T H > Assignee: Bilahari T H > Priority: Major > Labels: pull-request-available > Time Spent: 12h 10m > Remaining Estimate: 0h > > Optimize read performance for the following scenarios > # Read small files completely > Files that are of size smaller than the read buffer size can be considered > as small files. In case of such files it would be better to read the full > file into the AbfsInputStream buffer. > # Read last block if the read is for footer > If the read is for the last 8 bytes, read the full file. > This will optimize reads for parquet files. [Parquet file > format|https://www.ellicium.com/parquet-file-format-structure/] > Both these optimizations will be present under configs as follows > # fs.azure.read.smallfilescompletely > # fs.azure.read.optimizefooterread -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org