[ 
https://issues.apache.org/jira/browse/HADOOP-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13784585#comment-13784585
 ] 

Colin Patrick McCabe commented on HADOOP-9984:
----------------------------------------------

bq. My understanding is that a breaking change will be done in 2.3.0 for 
HADOOP-9972, regardless of what happens in this patch. Is that not the case? Do 
we expect to implement those new APIs fully in the base class without requiring 
anything new of subclasses?

HADOOP-9972 is not going to be an incompatible change, or require anything new 
from subclasses.

bq. What do you think of this as a compromise? It helps control some of the bad 
consequences discussed earlier.

I think I'm missing something in this whole discussion.  You seem to want to 
break the API after Hadoop 2 goes GA, but breaking the API is exactly what is 
not supposed to happen after GA, according to Arun.

I also don't understand the comments about "giving them more time."  
Proprietary or out-of-tree filesystems are *not* part of the Hadoop release, by 
definition.  What are the "downstream projects" you're referring to?  I suppose 
Ceph, QFS, and GlusterFS are three examples of out-of-tree FileSystems.   Are 
we delaying our release or reducing its quality for them?  If so, why?

Making {{listLinkStatus}} an abstract function is actually better for these 
out-of-tree implementors anyway.  It will bring to their attention the fact 
that the semantics of {{listStatus}} have changed, rather than sweeping it 
under the rug.  Allowing the code to silently compile and do the wrong thing 
doesn't seem like it's doing anyone any favors.  I can say firsthand that no 
matter what option we choose, the ceph hadoop plugin will need to be updated (I 
worked on that at one point).

Finally, you can't implement symlink resolution in the subclasses of 
AbstractFileSystem.  For FileContext, symlink resolution has to happen in FC.  
So that means either AbstractFileSystem#listStatus is going to be the 
equivalent of FileSystem#listLinkStatus, or you have to completely redesign FC. 
 Neither of those options seems like a good idea.  I think this, more than 
anything else, convinced me to take the path I did.

> FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by 
> default
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-9984
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9984
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 2.1.0-beta
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Blocker
>         Attachments: HADOOP-9984.001.patch, HADOOP-9984.003.patch, 
> HADOOP-9984.005.patch, HADOOP-9984.007.patch, HADOOP-9984.009.patch, 
> HADOOP-9984.010.patch, HADOOP-9984.011.patch, HADOOP-9984.012.patch, 
> HADOOP-9984.013.patch, HADOOP-9984.014.patch
>
>
> During the process of adding symlink support to FileSystem, we realized that 
> many existing HDFS clients would be broken by listStatus and globStatus 
> returning symlinks.  One example is applications that assume that 
> !FileStatus#isFile implies that the inode is a directory.  As we discussed in 
> HADOOP-9972 and HADOOP-9912, we should default these APIs to returning 
> resolved paths.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to