[ 
https://issues.apache.org/jira/browse/HADOOP-9972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773471#comment-13773471
 ] 

Colin Patrick McCabe commented on HADOOP-9972:
----------------------------------------------

bq.Just to be clear, what happens if the error handler does not rethrow the 
exception?

If the error handler doesn't rethrow the exception, the listStatus / globStatus 
operation continues normally and returns the remaining results.  (We can't 
return the result that had the error.)  Unresolved symlinks are one type of 
error.  Whether to handle {{UnresolvedLinkException}} differently than other 
exceptions is up to the {{PathErrorHandler}} you provide.

bq. I'm not sure if the difference between "log exception and continue" vs. 
"ignore it completely" is a different return code from the error handler method 
or just whether the handler logs or not.

I was proposing that the logging happen inside the {{PathErrorHandler}}.  From 
the point of file of FileSystem / FileContext, all we care about is whether the 
{{PathErrorHandler}} rethrows the exception or not.  (We can provide a class 
implementing PathErrorHandler that logs to FileSystem#LOG if that is a common 
use case.)

bq.  I suppose one could derive a new interface from PathFilter that becomes 
PathOptions and listStatus(Path, PathFilter) could check internally if it's 
actually got a PathOption instead of a PathFilter and behave differently. 
However I think an explicit, separate API would be preferable though, simply 
for clarity of what the API expects from callers.

Yeah, I was proposing adding a new type, {{PathOptions}}, which could contain 
an instance of {{PathFilter}}.  We could add new methods to {{PathFilter}}, but 
since it's a public/stable interface rather than an abstract class, that would 
be an incompatible change.
                
> new APIs for listStatus and globStatus to deal with symlinks
> ------------------------------------------------------------
>
>                 Key: HADOOP-9972
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9972
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 2.1.1-beta
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>
> Based on the discussion in HADOOP-9912, we need new APIs for FileSystem to 
> deal with symlinks.  The issue is that code has been written which is 
> incompatible with the existence of things which are not files or directories. 
>  For example,
> there is a lot of code out there that looks at FileStatus#isFile, and
> if it returns false, assumes that what it is looking at is a
> directory.  In the case of a symlink, this assumption is incorrect.
> It seems reasonable to make the default behavior of {{FileSystem#listStatus}} 
> and {{FileSystem#globStatus}} be fully resolving symlinks, and ignoring 
> dangling ones.  This will prevent incompatibility with existing MR jobs and 
> other HDFS users.  We should also add new versions of listStatus and 
> globStatus that allow new, symlink-aware code to deal with symlinks as 
> symlinks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to