Re: Hadoop 2 compatibility issues

Benson Margulies Tue, 14 May 2013 16:42:23 -0700

On Tue, May 14, 2013 at 7:36 PM, Christopher <[email protected]> wrote:
> Benson-
>
> They produce different byte-code. That's why we're even considering
> this. ACCUMULO-1402 is the ticket under which our intent is to add
> classifiers, so that they can be distinguished.


whoops, missed that.

Then how do people succeed in just fixing up their dependencies and using it?

In any case, speaking as a Maven-maven, classifiers are absolutely,
positively, a cure worse than the disease. If you want the details
just ask.

>
> All-
>
> To Keith's point, I think perhaps all this concern is a non-issue...
> because as Keith points out, the dependencies in question are marked
> as "provided", and dependency resolution doesn't occur for provided
> dependencies anyway... so even if we leave off the profiles, we're in
> the same boat. Maybe not the boat we should be in... but certainly not
> a sinking one as I had first imagined. It's as afloat as it was
> before, when they were not in a profile, but still marked as
> "provided".
>
> --
> Christopher L Tubbs II
> http://gravatar.com/ctubbsii
>
>
> On Tue, May 14, 2013 at 7:09 PM, Benson Margulies <[email protected]> 
> wrote:
>> I just doesn't make very much sense to me to have two different GAV's
>> for the very same .class files, just to get different dependencies in
>> the poms. However, if someone really wanted that, I'd look to make
>> some scripting that created this downstream from the main build.
>>
>>
>> On Tue, May 14, 2013 at 6:16 PM, John Vines <[email protected]> wrote:
>>> They're the same currently. I was requesting separate gavs for hadoop 2.
>>> It's been on the mailing list and jira.
>>>
>>> Sent from my phone, please pardon the typos and brevity.
>>> On May 14, 2013 6:14 PM, "Keith Turner" <[email protected]> wrote:
>>>
>>>> On Tue, May 14, 2013 at 5:51 PM, Benson Margulies <[email protected]
>>>> >wrote:
>>>>
>>>> > I am a maven developer, and I'm offering this advice based on my
>>>> > understanding of reason why that generic advice is offered.
>>>> >
>>>> > If you have different profiles that _build different results_ but all
>>>> > deliver the same GAV, you have chaos.
>>>> >
>>>>
>>>> What GAV are we currently producing for hadoop 1 and hadoop 2?
>>>>
>>>>
>>>> >
>>>> > If you have different profiles that test against different versions of
>>>> > dependencies, but all deliver the same byte code at the end of the
>>>> > day, you don't have chaos.
>>>> >
>>>> >
>>>> >
>>>> > On Tue, May 14, 2013 at 5:48 PM, Christopher <[email protected]>
>>>> wrote:
>>>> > > I think it's interesting that Option 4 seems to be most preferred...
>>>> > > because it's the *only* option that is explicitly advised against by
>>>> > > the Maven developers (from the information I've read). I can see its
>>>> > > appeal, but I really don't think that we should introduce an explicit
>>>> > > problem for users (that applies to users using even the Hadoop version
>>>> > > we directly build against... not just those using Hadoop 2... I don't
>>>> > > know if that point was clear), to only partially support a version of
>>>> > > Hadoop that is still alpha and has never had a stable release.
>>>> > >
>>>> > > BTW, Option 4 was how I had have achieved a solution for
>>>> > > ACCUMULO-1402, but am reluctant to apply that patch, with this issue
>>>> > > outstanding, as it may exacerbate the problem.
>>>> > >
>>>> > > Another implication for Option 4 (the current "solution") is for
>>>> > > 1.6.0, with the planned accumulo-maven-plugin... because it means that
>>>> > > the accumulo-maven-plugin will need to be configured like this:
>>>> > > <plugin>
>>>> > >   <groupId>org.apache.accumulo</groupId>
>>>> > >   <artifactId>accumulo-maven-plugin</artifactId>
>>>> > >   <dependencies>
>>>> > >    ... all the required hadoop 1 dependencies to make the plugin work,
>>>> > > even though this version only works against hadoop 1 anyway...
>>>> > >   </dependencies>
>>>> > >   ...
>>>> > > </plugin>
>>>> > >
>>>> > > --
>>>> > > Christopher L Tubbs II
>>>> > > http://gravatar.com/ctubbsii
>>>> > >
>>>> > >
>>>> > > On Tue, May 14, 2013 at 5:42 PM, Christopher <[email protected]>
>>>> > wrote:
>>>> > >> I think Option 2 is the best solution for "waiting until we have the
>>>> > >> time to solve the problem correctly", as it ensures that transitive
>>>> > >> dependencies work for the stable version of Hadoop, and using Hadoop2
>>>> > >> is a very simple documentation issue for how to apply the patch and
>>>> > >> rebuild. Option 4 doesn't wait... it explicitly introduces a problem
>>>> > >> for users.
>>>> > >>
>>>> > >> Option 1 is how I'm tentatively thinking about fixing it properly in
>>>> > 1.6.0.
>>>> > >>
>>>> > >>
>>>> > >> --
>>>> > >> Christopher L Tubbs II
>>>> > >> http://gravatar.com/ctubbsii
>>>> > >>
>>>> > >>
>>>> > >> On Tue, May 14, 2013 at 4:56 PM, John Vines <[email protected]> wrote:
>>>> > >>> I'm an advocate of option 4. You say that it's ignoring the problem,
>>>> > >>> whereas I think it's waiting until we have the time to solve the
>>>> > problem
>>>> > >>> correctly. Your reasoning for this is for standardizing for maven
>>>> > >>> conventions, but the other options, while more 'correct' from a maven
>>>> > >>> standpoint or a larger headache for our user base and ourselves. In
>>>> > either
>>>> > >>> case, we're going to be breaking some sort of convention, and while
>>>> > it's
>>>> > >>> not good, we should be doing the one that's less bad for US. The
>>>> > important
>>>> > >>> thing here, now, is that the poms work and we should go with the
>>>> method
>>>> > >>> that leaves the work minimal for our end users to utilize them.
>>>> > >>>
>>>> > >>> I do agree that 1. is the correct option in the long run. More
>>>> > >>> specifically, I think it boils down to having a single module
>>>> > compatibility
>>>> > >>> layer, which is how hbase deals with this issue. But like you said,
>>>> we
>>>> > >>> don't have the time to engineer a proper solution. So let sleeping
>>>> > dogs lie
>>>> > >>> and we can revamp the whole system for 1.5.1 or 1.6.0 when we have
>>>> the
>>>> > >>> cycles to do it right.
>>>> > >>>
>>>> > >>>
>>>> > >>> On Tue, May 14, 2013 at 4:40 PM, Christopher <[email protected]>
>>>> > wrote:
>>>> > >>>
>>>> > >>>> So, I've run into a problem with ACCUMULO-1402 that requires a
>>>> larger
>>>> > >>>> discussion about how Accumulo 1.5.0 should support Hadoop2.
>>>> > >>>>
>>>> > >>>> The problem is basically that profiles should not contain
>>>> > >>>> dependencies, because profiles don't get activated transitively. A
>>>> > >>>> slide deck by the Maven developers point this out as a bad
>>>> practice...
>>>> > >>>> yet it's a practice we rely on for our current implementation of
>>>> > >>>> Hadoop2 support
>>>> > >>>> (
>>>> http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
>>>> > >>>> slide 80).
>>>> > >>>>
>>>> > >>>> What this means is that even if we go through the work of publishing
>>>> > >>>> binary artifacts compiled against Hadoop2, neither our Hadoop1
>>>> > >>>> binaries or our Hadoop2 binaries will be able to transitively
>>>> resolve
>>>> > >>>> any dependencies defined in profiles. This has significant
>>>> > >>>> implications to user code that depends on Accumulo Maven artifacts.
>>>> > >>>> Every user will essentially have to explicitly add Hadoop
>>>> dependencies
>>>> > >>>> for every Accumulo artifact that has dependencies on Hadoop, either
>>>> > >>>> because we directly or transitively depend on Hadoop (they'll have
>>>> to
>>>> > >>>> peek into the profiles in our POMs and copy/paste the profile into
>>>> > >>>> their project). This becomes more complicated when we consider how
>>>> > >>>> users will try to use things like Instamo.
>>>> > >>>>
>>>> > >>>> There are workarounds, but none of them are really pleasant.
>>>> > >>>>
>>>> > >>>> 1. The best way to support both major Hadoop APIs is to have
>>>> separate
>>>> > >>>> modules with separate dependencies directly in the POM. This is a
>>>> fair
>>>> > >>>> amount of work, and in my opinion, would be too disruptive for
>>>> 1.5.0.
>>>> > >>>> This solution also gets us separate binaries for separate supported
>>>> > >>>> versions, which is useful.
>>>> > >>>>
>>>> > >>>> 2. A second option, and the preferred one I think for 1.5.0, is to
>>>> put
>>>> > >>>> a Hadoop2 patch in the branch's contrib directory
>>>> > >>>> (branches/1.5/contrib) that patches the POM files to support
>>>> building
>>>> > >>>> against Hadoop2. (Acknowledgement to Keith for suggesting this
>>>> > >>>> solution.)
>>>> > >>>>
>>>> > >>>> 3. A third option is to fork Accumulo, and maintain two separate
>>>> > >>>> builds (a more traditional technique). This adds merging nightmare
>>>> for
>>>> > >>>> features/patches, but gets around some reflection hacks that we may
>>>> > >>>> have been motivated to do in the past. I'm not a fan of this option,
>>>> > >>>> particularly because I don't want to replicate the fork nightmare
>>>> that
>>>> > >>>> has been the history of early Hadoop itself.
>>>> > >>>>
>>>> > >>>> 4. The last option is to do nothing and to continue to build with
>>>> the
>>>> > >>>> separate profiles as we are, and make users discover and specify
>>>> > >>>> transitive dependencies entirely on their own. I think this is the
>>>> > >>>> worst option, as it essentially amounts to "ignore the problem".
>>>> > >>>>
>>>> > >>>> At the very least, it does not seem reasonable to complete
>>>> > >>>> ACCUMULO-1402 for 1.5.0, given the complexity of this issue.
>>>> > >>>>
>>>> > >>>> Thoughts? Discussion? Vote on option?
>>>> > >>>>
>>>> > >>>> --
>>>> > >>>> Christopher L Tubbs II
>>>> > >>>> http://gravatar.com/ctubbsii
>>>> > >>>>
>>>> >
>>>>

Re: Hadoop 2 compatibility issues

Reply via email to