[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

ASF GitHub Bot (Jira) Tue, 30 Dec 2025 22:09:10 -0800


    [ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18048500#comment-18048500
 ]


ASF GitHub Bot commented on HADOOP-19767:
-----------------------------------------

anujmodi2021 opened a new pull request, #8153:
URL: https://github.com/apache/hadoop/pull/8153

   ### Description of PR
   Since the onset of ABFS Driver, there has been a single implementation of 
AbfsInputStream. Different kinds of workloads require different heuristics to 
give the best performance for that type of workload. For example: 
   
   Sequential Read Workloads like DFSIO and DistCP gain performance improvement 
from prefetched 
   Random Read Workloads on other hand do not need Prefetches and enabling 
prefetches for them is an overhead and TPS heavy 
   Query Workloads involving Parquet/ORC files benefit from improvements like 
Footer Read and Small Files Reads
   
   To accomodate this we need to determine the pattern and accordingly create 
Input Streams implemented for that particular pattern.
   
   <img width="635" height="290" alt="image" 
src="https://github.com/user-attachments/assets/5b7a3db9-ab04-43cf-b44e-5e7a6582205f";
 />
   
   Moving ahead more relevant policies and specialized implementation of 
AbfsInputStream can be added.
   
   This PR only refactors the way we create input streams. No logical change 
introduced. As today by default we will continue to use AbfsAdaptiveInputStream 
which can cater to all kind of workloads.
   
   ### How was this patch tested?
   New tests were added.
   




> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> --------------------------------------------------------------------
>
>                 Key: HADOOP-19767
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19767
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/azure
>    Affects Versions: 3.4.2
>            Reporter: Anuj Modi
>            Assignee: Anuj Modi
>            Priority: Major
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

Reply via email to