[GitHub] [hadoop] sapant-msft commented on issue #1708: HADOOP-16696: Always read ahead config, to use read ahead even for non sequential reads.

GitBox Fri, 22 Nov 2019 19:45:24 -0800

sapant-msft commented on issue #1708: HADOOP-16696: Always read ahead config, 
to use read ahead even for non sequential reads.
URL: https://github.com/apache/hadoop/pull/1708#issuecomment-557763272
 
 
   Hi @steveloughran ,
   Thank you for the suggestions. A new test was added 
(ITestAbfsReadWriteAndSeekReadAheadEnabled)
   We have an internal spark workload which reads a parquet file (10 MB), has 
the following read pattern- Seek_1, Read_1, Seek_2, Read_2, Read_3. Currently, 
read ahead is disabled if there are seeks. This change request introduces a 
config , AlwaysReadAhead  (disabled by default), which allows user to override 
this behavior if desired. With this option turned on, we were able to reduce 
the total service side requests by 1/3- as this option allowed readahead to 
couple the Read_2 and Read_3, thereby greatly improving the efficiency. As 
parquet files are widely used, especially for Spark workloads, we are confident 
this could improve performance (and reduce network I/O) for a large number of 
users.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [hadoop] sapant-msft commented on issue #1708: HADOOP-16696: Always read ahead config, to use read ahead even for non sequential reads.

Reply via email to