[ 
https://issues.apache.org/jira/browse/HADOOP-16101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16764855#comment-16764855
 ] 

Steve Loughran commented on HADOOP-16101:
-----------------------------------------

I've thought about not doing the HEAD first -see HADOOP-13712

We've been constrained by the expectation that "if the file doesn't exist, 
open() must fail". With the new openFile() and its future<> response, we have a 
bit more leeway. 

h3. now may be the time to change the spec there and say "if you open a file 
with openFile(), failures may not surface until the stream is read()". 

FWIW, even though getFileStatus is doing three checks, in the successful path 
"the file is present", only that initial HEAD is used. The failure case does do 
three calls, with the last two essentially choosing between FNFE and some path 
is directory exception (which may be FNFE anyway, as some filesystems do). 
Because its the failure path, optimising that is probably less beneficial than 
saving 200ms on every file open, which could be done if we purge that initial 
HEAD and go straight for the GET on read. 

> Use lighter-weight alternatives to innerGetFileStatus where possible
> --------------------------------------------------------------------
>
>                 Key: HADOOP-16101
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16101
>             Project: Hadoop Common
>          Issue Type: Sub-task
>            Reporter: Sean Mackrory
>            Priority: Major
>
> Discussion in HADOOP-15999 highlighted the heaviness of a full 
> innerGetFileStatus call, where many usages of it may need a lighter weight 
> fileExists, etc. Let's investigate usage of innerGetFileStatus and slim it 
> down where possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to