steveloughran commented on issue #1601: HADOOP-16635. S3A innerGetFileStatus 
scans for directories-only still does a HEAD.
URL: https://github.com/apache/hadoop/pull/1601#issuecomment-540571728
 
 
   Sid, thanks for the comments, will review/update the patch
   
   Interesting point about the double list. This code path is how its always 
been, presumably descended from the s3n code. LIST is slower, costs more and 
much more prone to eventual consistency, which are all good arguments for HEAD 
first.
   
   I actually plan to tune some of the calls which always seem to get used on 
directory walks (listStatus, listFiles, listLocatedStatus) to do the subtree 
list first, and only go for the HEAD calls if they don't find any children. 
This is to reduce the cost of treewalks where the bias is towards populated 
directories

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to