snvijaya opened a new pull request #2368:
URL: https://github.com/apache/hadoop/pull/2368


   Customers migrating from Gen1 to Gen2 often are observing different read 
patterns for the same workload. The optimization in Gen2 which reads only 
requested data size once detected as random read pattern is usually the cause 
of difference.
   
   In this PR, config option to force Gen2 driver to read always in buffer size 
even for random is being introduced. With this enabled the read pattern for the 
job will be similar to Gen1 and be full buffer sizes to backend.
   
   Have also accommodated the request to config control the readahead size to 
help cases such as small row groups in parquet files, where more data can be 
captured.
   
   These configs are not determined to be performant on the official parquet 
recommended row group sizes of 512-1024 MB and hence will not be enabled by 
default. 
   
   Tests are added to verify various combinations of config values. Also 
modified tests in file ITestAzureBlobFileSystemRandomRead which were using same 
file and hence test debugging was getting harder.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to