Hi Folks,
I'm kicking off this overdue thread to obtain good understanding of exactly
whats going on with the Hadoop Shims. The documentation is lacking at the
moment and I am therefore putting time in to rectifying this.
My humble beginnings are in progress below

Scenario - Upgrade Nutch 2.3.1-SNAPSHOT to Gora 0.6
Jira Issue - https://issues.apache.org/jira/browse/NUTCH-1946
Observations - From my initial analysis of the current state of the Shims,
here are some initial observations

   - gora-shims-distribution relies upon gora-shims-hadoop,
   gora-shims-hadoop1 and gora-shims-hadoop2
   - gora-shims-hadoop provides a parent for gora-shims-hadoop1 and
   gora-shims-hadoop2, however it also had direct dependencies upon the
   - org.apache.hadoop:hadoop-client:jar:2.5.2:compile
      - org.apache.hadoop:hadoop-hdfs:jar:2.5.2:compile
      - org.apache.hadoop:hadoop-mapreduce-client-app:jar:2.5.2:compile
      - org.apache.hadoop:hadoop-yarn-api:jar:2.5.2:compile
      - org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.5.2:compile
      - org.apache.hadoop:hadoop-annotations:jar:2.5.2:compile

   - As stated above, both gora-shims-hadoop1 and gora-shims-hadoop2 depend
   upon gora-shims-hadoop with the difference being that gora-shims-hadoop1
   then defines hadoop 1.X dependencies.

Problems - I understand that we have upgraded to Hadoop 2.5.2 by default.
This is great. What I am failing to get a grasp on however is exactly how
we provide guidance on upgrade to Gora 0.6 without upgrades from Hadoop
1.2.X --> 2.5.X?

Bearing in mind that gora-core depends upon gora-shims-hadoop therefore
Hadoop 2.5.2 dependencies are automatically fetched in a transitive fashion
whenever we with to upgrade gora-core dependency from 0.5 --> 0.6.

I am going to experiment with using a bunch of exclusions in my pom.xml
under the gora-shims-hadoop dependency e.g exclude all above Hadoop
dependencies, then explicitly add the gora-shims-hadoop1 dependency.

What is making this worse, is that I cannot create profiles for this
upgrade as I would be able to do in a Maven project because I am working
with Ant + Ivy.

Any thoughts would be very much appreciated. Essentially whatever we
discuss here is creation the foundation for the Gora Shims documentation so
it would be very much appreciated.




Reply via email to