[ 
https://issues.apache.org/jira/browse/HADOOP-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13783692#comment-13783692
 ] 

Chris Nauroth commented on HADOOP-9984:
---------------------------------------

After more thought, I'm more concerned about the risk of breaking 
{{FileSystem}} subclasses in external projects.  It's quite late to make an 
incompatible change, and there would be very little time for projects to react. 
 (BTW, this is independent of the discussion of whether we call the next 
release 2.1.2 or 2.2.0.  However we label the version, it's not allowing much 
time for downstream projects that want to adopt this version.)

A backwards-incompatible change at some point is unavoidable, because we are 
all agreed on adding new abstract methods in the base classes.  However, maybe 
we don't need to break compatibility for 2.1.2.  I think the plan would look 
like this:

# Change this patch so that {{listStatus}} remains abstract in the base class, 
and all subclasses must implement it to auto-resolve symlinks.  This would 
cause unavoidable code duplication across the subclasses.
# Targeting 2.3.0 (probably part of HADOOP-9972), add the new abstract methods 
that we need, and refactor back to something like the current patch.  This 
would clean up the unfortunate code duplication introduced in step 1.

This definitely would have some negative consequences compared to the current 
version of the patch.  There would be a lot more code duplication in 
subclasses.  It wouldn't as strongly enforce that {{listStatus}} auto-resolves 
in 2.1.2.  It would be more like the honor system with documentation stating 
that "subclasses are expected to auto-resolve".  The current design provides a 
stronger guarantee in that the subclasses just implement methods for 
non-resolving list + symlink resolution of one path, and the abstract class 
implements auto-resolving list by composing those two primitives.

Even though this is irritating and ultimately causes more work for us, I think 
it's worthwhile to consider it for protecting our downstream clients.  In 
2.3.0, we'd have the opportunity to clean it all up and give downstream 
projects plenty of time to react.  Thoughts?


> FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by 
> default
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-9984
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9984
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 2.1.0-beta
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Blocker
>         Attachments: HADOOP-9984.001.patch, HADOOP-9984.003.patch, 
> HADOOP-9984.005.patch, HADOOP-9984.007.patch, HADOOP-9984.009.patch, 
> HADOOP-9984.010.patch, HADOOP-9984.011.patch, HADOOP-9984.012.patch
>
>
> During the process of adding symlink support to FileSystem, we realized that 
> many existing HDFS clients would be broken by listStatus and globStatus 
> returning symlinks.  One example is applications that assume that 
> !FileStatus#isFile implies that the inode is a directory.  As we discussed in 
> HADOOP-9972 and HADOOP-9912, we should default these APIs to returning 
> resolved paths.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to