I encourage interested parties to read through HADOOP-9912 to get a feel for the issues. There really is no way to add symlink support without changing the behavior of existing APIs. Ultimately, anything that returns a FileStatus is going to be different. Even if we default to resolving symlinks, resolving can lead to FileNotFound or permission errors. Thus, we have to choose whether to prune the bad links, show the bad links as dangling, or throwing an exception. None of these options are compatible.
I'm really concerned about putting this in a minor release like 2.3 since it has the potential to break a lot of user code. HADOOP-9912 is an example from within our own ecosystem, but think of all the custom user code out there written against FileSystem. 2.2 GA is basically our last chance to make this kind of change before Hadoop 3. Thanks, Andrew On Tue, Sep 17, 2013 at 9:10 AM, Colin McCabe <cmcc...@alumni.cmu.edu>wrote: > The issue is not modifying existing APIs. The issue is that code has > been written that makes assumptions that are incompatible with the > existence of things that are not files or directories. For example, > there is a lot of code out there that looks at FileStatus#isFile, and > if it returns false, assumes that what it is looking at is a > directory. In the case of a symlink, this assumption is incorrect. > > Faced with this, we have considered making the default behavior of > listStatus and globStatus to be fully resolving symlinks, and simply > not listing dangling symlinks. Code which is prepared to deal symlinks > can use newer versions of the listStatus and globStatus functions > which do return symlinks as symlinks. > > We might consider defaulting FileSystem#listStatus and > FileSystem#globStatus to "fully resolving symlinks by default" and > defaulting FileContext#listStatus and FileContext#Util#globStatus to > the opposite. This seems like the maximally compatible solution that > we're going to get. I think this makes sense. > > The alternative is kicking the can down the road to Hadoop 3, and > letting vendors of alternative (including some proprietary > alternative) systems continue to claim that "Hadoop doesn't support > symlinks yet" (with some justice). > > P.S. I would be fine with putting this in 2.2 or 2.3 if that seems > more appropriate. > > sincerely, > Colin > > On Tue, Sep 17, 2013 at 8:23 AM, Suresh Srinivas <sur...@hortonworks.com> > wrote: > > I agree that this is an important change. However, 2.2.0 GA is getting > > ready to rollout in weeks. I am concerned that these changes will add not > > only incompatible changes late in the game, but also possibly > instability. > > Java API incompatibility is some thing we have avoided for the most part > > and I am concerned that this is adding such incompatibility in FileSystem > > APIs. We should find work arounds by adding possibly newer APIs and > leaving > > existing APIs as is. If this can be done, my vote is to enable this > feature > > in 2.3. Even if it cannot be done, I am concerned that this is coming > quite > > late and we should see if could allow some incompatible changes into 2.3 > > for this feature. > > > > > > On Mon, Sep 16, 2013 at 6:49 PM, Andrew Wang <andrew.w...@cloudera.com > >wrote: > > > >> Hi all, > >> > >> I wanted to broadcast plans for putting the FileSystem symlinks work > >> (HADOOP-8040) into branch-2.1 for the pending Hadoop 2 GA release. I > think > >> it's pretty important we get it in since it's not a compatible change; > if > >> it misses the GA train, we're not going to have symlinks until the next > >> major release. > >> > >> However, we're still dealing with ongoing issues revealed via testing. > >> There's user-code out there that only handles files and directories and > >> will barf when given a symlink (perhaps a dangling one!). See > HADOOP-9912 > >> for a nice example where globStatus returning symlinks broke Pig; some > of > >> us had a conference call to talk it through, and one definite conclusion > >> was that this wasn't solvable in a generally compatible manner. > >> > >> There are also still some gaps in symlink support right now. For > example, > >> the more esoteric FileSystems like WebHDFS, HttpFS, and HFTP need > symlink > >> resolution, and tooling like the FsShell and Distcp still need to be > >> updated as well. > >> > >> So, there's definitely work to be done, but there are a lot of users > >> interested in the feature, and symlinks really should be in GA. Would > >> appreciate any thoughts/input on the matter. > >> > >> Thanks, > >> Andrew > >> > > > > > > > > -- > > http://hortonworks.com/download/ > > > > -- > > CONFIDENTIALITY NOTICE > > NOTICE: This message is intended for the use of the individual or entity > to > > which it is addressed and may contain information that is confidential, > > privileged and exempt from disclosure under applicable law. If the reader > > of this message is not the intended recipient, you are hereby notified > that > > any printing, copying, dissemination, distribution, disclosure or > > forwarding of this communication is strictly prohibited. If you have > > received this communication in error, please contact the sender > immediately > > and delete it from your system. Thank You. >