What we're trying to get to here is a consensus on whether FileSystem#listStatus and FileSystem#globStatus should return symlinks __as_symlinks__. If 2.1-beta goes out with these semantics, I think we are not going to be able to change them later. That is what will happen in the "do nothing" scenario.
Also see Jason Lowe's comment here: https://issues.apache.org/jira/browse/HADOOP-9912?focusedCommentId=13772002&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13772002 Colin On Wed, Sep 18, 2013 at 5:11 PM, J. Rottinghuis <jrottingh...@gmail.com> wrote: > However painful protobuf version changes are at build time for Hadoop > developers, at runtime with multiple clusters and many Hadoop users this is > a total nightmare. > Even upgrading clusters from one protobuf version to the next is going to > be very difficult. The same users will run jobs on, and/or read&write to > multiple clusters. That means that they will have to fork their code, run > multiple instances? Or in the very least they have to do an update to their > applications. All in sync with Hadoop cluster changes. And these are not > doable in a rolling fashion. > All Hadoop and HBase clusters will all upgrade at the same time, or we'll > have to have our users fork / roll multiple versions ? > My point is that these things are much harder than just fix the (Jenkins) > build and we're done. These changes are massively disruptive. > > There is a similar situation with symlinks. Having an API that lets users > create symlinks is very problematic. Some users create symlinks and as Eli > pointed out, somebody else (or automated process) tries to copy to / from > another (Hadoop 1.x?) cluster over hftp. What will happen ? > Having an API that people should not use is also a nightmare. We > experienced this with append. For a while it was there, but users were "not > allowed to use it" (or else there were large #'s of corrupt blocks). If > there is an API to create a symlink, then some of our users are going to > use it and others are going to trip over those symlinks. We already know > that Pig does not work with symlinks yet, and as Steve pointed out, there > is tons of other code out there that assumes that !isDir() means isFile(). > > I like symlink functionality, but in our migration to Hadoop 2.x this is a > total distraction. If the APIs stay in 2.2 GA we'll have to choose to: > a) Not uprev until symlink support is figured out up and down the stack, > and we've been able to migrate all our 1.x (equivalent) clusters to 2.x > (equivalent). Or > b) rip out the API altogether. Or > c) change the implementation to throw an UnsupportedOperationException > I'm not sure yet which of these I like least. > > Thanks, > > Joep > > > > > On Wed, Sep 18, 2013 at 9:48 AM, Arun C Murthy <a...@hortonworks.com> wrote: > >> >> On Sep 16, 2013, at 6:49 PM, Andrew Wang <andrew.w...@cloudera.com> wrote: >> >> > Hi all, >> > >> > I wanted to broadcast plans for putting the FileSystem symlinks work >> > (HADOOP-8040) into branch-2.1 for the pending Hadoop 2 GA release. I >> think >> > it's pretty important we get it in since it's not a compatible change; if >> > it misses the GA train, we're not going to have symlinks until the next >> > major release. >> >> Just catching up, is this an incompatible change, or not? The above reads >> 'not an incompatible change'. >> >> Arun >> >> > >> > However, we're still dealing with ongoing issues revealed via testing. >> > There's user-code out there that only handles files and directories and >> > will barf when given a symlink (perhaps a dangling one!). See HADOOP-9912 >> > for a nice example where globStatus returning symlinks broke Pig; some of >> > us had a conference call to talk it through, and one definite conclusion >> > was that this wasn't solvable in a generally compatible manner. >> > >> > There are also still some gaps in symlink support right now. For example, >> > the more esoteric FileSystems like WebHDFS, HttpFS, and HFTP need symlink >> > resolution, and tooling like the FsShell and Distcp still need to be >> > updated as well. >> > >> > So, there's definitely work to be done, but there are a lot of users >> > interested in the feature, and symlinks really should be in GA. Would >> > appreciate any thoughts/input on the matter. >> > >> > Thanks, >> > Andrew >> >> -- >> Arun C. Murthy >> Hortonworks Inc. >> http://hortonworks.com/ >> >> >> >> -- >> CONFIDENTIALITY NOTICE >> NOTICE: This message is intended for the use of the individual or entity to >> which it is addressed and may contain information that is confidential, >> privileged and exempt from disclosure under applicable law. If the reader >> of this message is not the intended recipient, you are hereby notified that >> any printing, copying, dissemination, distribution, disclosure or >> forwarding of this communication is strictly prohibited. If you have >> received this communication in error, please contact the sender immediately >> and delete it from your system. Thank You. >>