[ 
https://issues.apache.org/jira/browse/HADOOP-17347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17257885#comment-17257885
 ] 

Rajesh Balamohan commented on HADOOP-17347:
-------------------------------------------


>> If the read is for the last 8 bytes, read the full file.

Can you plz share details on this? Does this mean that it is going to load 4 MB 
(or buffer size) worth of data during footer reads? If so, it would be 
expensive for short jobs that rely on footer reads.

> ABFS: Read optimizations
> ------------------------
>
>                 Key: HADOOP-17347
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17347
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/azure
>    Affects Versions: 3.4.0
>            Reporter: Bilahari T H
>            Assignee: Bilahari T H
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 12h 10m
>  Remaining Estimate: 0h
>
> Optimize read performance for the following scenarios
>  # Read small files completely
>  Files that are of size smaller than the read buffer size can be considered 
> as small files. In case of such files it would be better to read the full 
> file into the AbfsInputStream buffer.
>  # Read last block if the read is for footer
>  If the read is for the last 8 bytes, read the full file.
>  This will optimize reads for parquet files. [Parquet file 
> format|https://www.ellicium.com/parquet-file-format-structure/]
> Both these optimizations will be present under configs as follows
>  # fs.azure.read.smallfilescompletely
>  # fs.azure.read.optimizefooterread



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to