[
https://issues.apache.org/jira/browse/HADOOP-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13783692#comment-13783692
]
Chris Nauroth commented on HADOOP-9984:
---------------------------------------
After more thought, I'm more concerned about the risk of breaking
{{FileSystem}} subclasses in external projects. It's quite late to make an
incompatible change, and there would be very little time for projects to react.
(BTW, this is independent of the discussion of whether we call the next
release 2.1.2 or 2.2.0. However we label the version, it's not allowing much
time for downstream projects that want to adopt this version.)
A backwards-incompatible change at some point is unavoidable, because we are
all agreed on adding new abstract methods in the base classes. However, maybe
we don't need to break compatibility for 2.1.2. I think the plan would look
like this:
# Change this patch so that {{listStatus}} remains abstract in the base class,
and all subclasses must implement it to auto-resolve symlinks. This would
cause unavoidable code duplication across the subclasses.
# Targeting 2.3.0 (probably part of HADOOP-9972), add the new abstract methods
that we need, and refactor back to something like the current patch. This
would clean up the unfortunate code duplication introduced in step 1.
This definitely would have some negative consequences compared to the current
version of the patch. There would be a lot more code duplication in
subclasses. It wouldn't as strongly enforce that {{listStatus}} auto-resolves
in 2.1.2. It would be more like the honor system with documentation stating
that "subclasses are expected to auto-resolve". The current design provides a
stronger guarantee in that the subclasses just implement methods for
non-resolving list + symlink resolution of one path, and the abstract class
implements auto-resolving list by composing those two primitives.
Even though this is irritating and ultimately causes more work for us, I think
it's worthwhile to consider it for protecting our downstream clients. In
2.3.0, we'd have the opportunity to clean it all up and give downstream
projects plenty of time to react. Thoughts?
> FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by
> default
> ----------------------------------------------------------------------------------
>
> Key: HADOOP-9984
> URL: https://issues.apache.org/jira/browse/HADOOP-9984
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs
> Affects Versions: 2.1.0-beta
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Priority: Blocker
> Attachments: HADOOP-9984.001.patch, HADOOP-9984.003.patch,
> HADOOP-9984.005.patch, HADOOP-9984.007.patch, HADOOP-9984.009.patch,
> HADOOP-9984.010.patch, HADOOP-9984.011.patch, HADOOP-9984.012.patch
>
>
> During the process of adding symlink support to FileSystem, we realized that
> many existing HDFS clients would be broken by listStatus and globStatus
> returning symlinks. One example is applications that assume that
> !FileStatus#isFile implies that the inode is a directory. As we discussed in
> HADOOP-9972 and HADOOP-9912, we should default these APIs to returning
> resolved paths.
--
This message was sent by Atlassian JIRA
(v6.1#6144)