Avro has a compile-time ('provided') dependency on Hadoop MR APIs in
avro-mapred and trevni-avro, both of which use the new MR APIs that
changed incompatibly between Hadoop 1 and 2. We introduced separate
profiles so we could produce separate binary artifacts for avro-mapred
and trevni-avro for Hadoop 1 and 2. Users set a classifier to select
the one they want to use, and in the absence of a classifier the
Hadoop 1 artifact is used.Hadoop 2 has been the stable Hadoop release for a while now [1], and I think most people are using Hadoop 2 based clusters these days so we should at least change the default to Hadoop 2. If we only built against Hadoop 2 then the avro-mapred and trevni-avro JARs would not work on Hadoop 1 clusters, but I think that would be OK for Avro 1.8. It would be good to remove classifiers since they are easily missed by users, and don't work well with transitive dependencies [2]. The Avro tools JAR has always only included Hadoop 1 classes (actually 0.20.205.0). It's a bug that you still can't use the tools JAR against a Hadoop 2 cluster - i.e. that we don't provide a Hadoop 2 artifact for this. AVRO-1567 is another bug. Both would be fixed by using Hadoop 2 dependencies. Cheers, Tom [1] http://www.us.apache.org/dist/hadoop/common/stable/ [2] https://github.com/Parquet/parquet-mr/pull/32#issuecomment-17283008 On Fri, Aug 22, 2014 at 10:17 PM, Doug Cutting <[email protected]> wrote: > I'm not proposing dropping Hadoop 1.x APIs, since most (all?) of those are > still present in 2.x. Rather I'm proposing we replace Hadoop 1.x > dependencies with Hadoop 2.x, no longer building releases compiled against > 1.x and no longer testing against 1.x. Currently we build jars compiled > against both 1.x and 2.x, but most testing (e.g., Jenkins) is only done > against 1.x. > > The specific problem is that the Hadoop 1.x runtime uses a Sun-specific > class that causes tests to fail udner IBM's JVM, while the Hadoop 2.x > runtime does not. I don't propose we make any code changes, rather just > update poms to avoid this runtime problem. > > An alternative is to add profiles for different Hadoop versions to poms of > all modules that depend on Hadoop, and to perform Jenkins testing against > both profiles. The former creates a lot of duplication in the poms, making > them harder to maintain. The latter adds maintenance costs to keep Jenkins > running. I'm not convinced the benefit is worth the effort. Do we think > folks using Hadoop 1.x will update to Avro 1.8? > > Doug > > > On Fri, Aug 22, 2014 at 12:09 PM, Sean Busbey <[email protected]> wrote: > >> AVRO-1567 is attempting to get Avro working well with the IBM JVM and some >> of our dependency on Hadoop is causing them pain. >> >> Specifically, there's some location where we rely on Hadoop 1 core for a >> method that internally uses Sun JVM specific code. In Hadoop 2's client the >> issue is fixed. >> >> Doug mentioned the possibility that we simply drop Hadoop 1 support for 1.8 >> and rely on the presence of a fix in the Hadoop 2 version. >> >> What do folks think? >> >> Personally, I'm -0. As an alternative, I think we could change 1.8 to >> default the tools artifact to Hadoop 2 without expressly dropping Hadoop 1 >> support. >> >> Are there other compelling reasons to drop Hadoop 1 APIs? >> >> -- >> Sean >>
