anujmodi2021 opened a new pull request, #8153: URL: https://github.com/apache/hadoop/pull/8153
### Description of PR Since the onset of ABFS Driver, there has been a single implementation of AbfsInputStream. Different kinds of workloads require different heuristics to give the best performance for that type of workload. For example: Sequential Read Workloads like DFSIO and DistCP gain performance improvement from prefetched Random Read Workloads on other hand do not need Prefetches and enabling prefetches for them is an overhead and TPS heavy Query Workloads involving Parquet/ORC files benefit from improvements like Footer Read and Small Files Reads To accomodate this we need to determine the pattern and accordingly create Input Streams implemented for that particular pattern. <img width="635" height="290" alt="image" src="https://github.com/user-attachments/assets/5b7a3db9-ab04-43cf-b44e-5e7a6582205f" /> Moving ahead more relevant policies and specialized implementation of AbfsInputStream can be added. This PR only refactors the way we create input streams. No logical change introduced. As today by default we will continue to use AbfsAdaptiveInputStream which can cater to all kind of workloads. ### How was this patch tested? New tests were added. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
