It's an incompatible change. Existing APIs like listStatus and globStatus need to be symlink aware now, which can break assumptions of user code. We've had FileStatus#isSymlink() since the early days, but lots of user code hasn't been updated to use it.
I think Eli's earlier email did a good job at laying out the current state and our options. I didn't realize this before, but most of HADOOP-8040 is already in branch-2.1-beta, but many of the subsequent changes are not (e.g. HADOOP-9417, HADOOP-9817, HADOOP-9652). This means the current state of symlink support in branch-2.1-beta is half-baked, which is why "do nothing" is not a good option. With that in mind, perhaps Eli's proposals (abbreviated here) make more sense: 1) Delay 2.2 GA and put in some more effort to fix API issues like HADOOP-9912 / HADOOP-9972. Undoubtedly, more issues will still fall out of this post-GA, but we can do our best to fix these issues compatibly in 2.3. 2) Revert symlinks from branch-2.1-beta and leave it all for 2.3, but that makes 2.3 a pretty big jump from GA. Since symlinks have already appeared in the 2.1.0 release, it'd also technically make 2.2 a regression from 2.1.0. 3) Wait for 3.0, which I don't think anyone wants. On Wed, Sep 18, 2013 at 10:05 AM, Steve Loughran <ste...@hortonworks.com>wrote: > the main change is whatever APIs are going to be provided (and implicitly: > supported for a long time) to handle symlinks separately from directories > > > On 18 September 2013 17:24, Eli Collins <e...@cloudera.com> wrote: > > > On Wed, Sep 18, 2013 at 5:45 AM, Steve Loughran <ste...@hortonworks.com > > >wrote: > > > > > On 18 September 2013 12:53, Alejandro Abdelnur <t...@cloudera.com> > > wrote: > > > > > > > On Wed, Sep 18, 2013 at 11:29 AM, Steve Loughran < > > ste...@hortonworks.com > > > > >wrote: > > > > > > > > > I'm reluctant for this as while delaying the release, because we > are > > > > going > > > > > to find problems all the way up the stack -which will require a > > > > > choreographed set of changes. Given the grief of the protbuf > update, > > I > > > > > don't want to go near that just before the final release. > > > > > > > > > > > > > Well, I would use the exact same argument used for protobuf (which > only > > > > complication was getting protoc 2.5.0 in the jenkins boxes and > > > communicate > > > > developers to do the same, other than that we didn't hit any other > > issue > > > > AFAIK) ... > > > > > > > > > > protobuf was traumatic at build time, as I recall because it was > neither > > > forwards or backwards compatible. Those of us trying to build different > > > branches had to choose which version to have on the path, or set up > > scripts > > > to do the switching. HBase needed rebuilding, so did other things. And > I > > > still have the pain of downloading and installing protoc on all Linux > > VMs I > > > build up going forward, until apt-get and yum have protoc 2.5 > artifacts. > > > > > > This means it was very painful for developer, added a lot of late > > breaking > > > pain to the developers, but it had one key feature that gave it an > edge: > > it > > > was immediately obvious where you had a problem as things didn't > compile > > or > > > classload without linkage problems. No latent bugs, unless protobuf 2.5 > > has > > > them internally -for which we have to rely on google's release testing > to > > > have found. > > > > > > That is a lot simpler to regression test than adding any new feature to > > > HDFS and seeing what breaks -as that is something that only surfaces > out > > in > > > the field. Which is why I think it's too late in the 2.1 release > > timetable > > > to add symlinks. We've had a 2.1-beta out there, we've got feedback. > Fix > > > those problems that are show stoppers, but don't add more stuff. Which > is > > > precisely why I have not been pushing in any of my recent changes. I > may > > > seem ruthless arguing against symlinks -but I'm not being inconsistent > > with > > > my own commit history. The only two things I've put in branch-2.1 since > > > beta-1 were a separate log for the Configuration deprecation warnings > > and a > > > patch to the POM for a java7 build on OSX: and they weren't even my > > > patches. > > > > > > > > > -Steve > > > > > > (One of these days I should volunteer to be the release manager and > it'll > > > be obvious that Arun is being quite amenable to all the other > developers) > > > > > > > > > > > > > > > > > IMO, it makes more sense to do this change during the beta rather > than > > > when > > > > GA. That gives us more flexibility to iron out things if necessary. > > > > > > > > > > > I'm arguing this change can go into the beta of the successor to 2.1 > -not > > > GA. > > > > > > > > What does "this change" refer to? Symlinks are already in 2.1, and the > > existing semantics create problems for programs (eg see the pig > > example in HADOOP-9912) > > that we need to resolve. I don't think do nothing is an option for 2.2. > > GA. > > > > Thanks, > > Eli > > > > > > > > > > > > > > > > > -- > > > CONFIDENTIALITY NOTICE > > > NOTICE: This message is intended for the use of the individual or > entity > > to > > > which it is addressed and may contain information that is confidential, > > > privileged and exempt from disclosure under applicable law. If the > reader > > > of this message is not the intended recipient, you are hereby notified > > that > > > any printing, copying, dissemination, distribution, disclosure or > > > forwarding of this communication is strictly prohibited. If you have > > > received this communication in error, please contact the sender > > immediately > > > and delete it from your system. Thank You. > > > > > > > > > -- > Steve Loughran > Hortonworks Inc > ste...@hortonworks.com > skype: steve_loughran > tel: +1 408 400 3721 > > -- > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity to > which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You. >