[GitHub] [hadoop] steveloughran commented on pull request #2584: HADOOP-16202. Enhance openFile() for better read performance against object stores

GitBox Wed, 06 Apr 2022 07:00:43 -0700


steveloughran commented on PR #2584:
URL: https://github.com/apache/hadoop/pull/2584#issuecomment-1090308409


   really need reviews of this @mukund-thakur @mehakmeet @bibinchundatt 
@dannycjones @surendralilhore
   
   This patch needs to go in before any other input stream optimisations so 
that 
   1. we can cut that HEAD request overhead on small files
   2.  distcp and fsshell can tell the streams that they are reading the whole 
file, so they should do big reads and expect no backwards seek.
   3. parquet and orc libs can switch to this to get 
   
   although #2975 sets it up, this PR doesn't include abfs in handling the file 
length option as an alternative to the file status.
   
   I've looked at it but need a plan about etag tracking. we will have to 
replicate the bit in the s3a code where the first GET's etag is picked up and 
used from then on. A future piece of work. This PR does contain the tests that 
are needed there though...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [hadoop] steveloughran commented on pull request #2584: HADOOP-16202. Enhance openFile() for better read performance against object stores

Reply via email to