Re: Workings of Hadoop Shims

2015-02-24 Thread Henry Saputra
The gora-shims-distribution have optional dependencies on Hadoop-2
which should be ok.

Lewis, could you try update gora-core/pom.xml to add optional to be
true for the hadoop-client dependency:

dependency
  groupIdorg.apache.hadoop/groupId
  artifactIdhadoop-client/artifactId
  optionaltrue/optional
/dependency

- Henry


On Sun, Feb 22, 2015 at 3:52 PM, Lewis John Mcgibbney
lewis.mcgibb...@gmail.com wrote:
 Hi Folks,
 I'm kicking off this overdue thread to obtain good understanding of exactly
 whats going on with the Hadoop Shims. The documentation is lacking at the
 moment and I am therefore putting time in to rectifying this.
 My humble beginnings are in progress below
 http://gora.apache.org/current/gora-shims.html

 Scenario - Upgrade Nutch 2.3.1-SNAPSHOT to Gora 0.6
 Jira Issue - https://issues.apache.org/jira/browse/NUTCH-1946
 Observations - From my initial analysis of the current state of the Shims,
 here are some initial observations

- gora-shims-distribution relies upon gora-shims-hadoop,
gora-shims-hadoop1 and gora-shims-hadoop2
- gora-shims-hadoop provides a parent for gora-shims-hadoop1 and
gora-shims-hadoop2, however it also had direct dependencies upon the
following
- org.apache.hadoop:hadoop-client:jar:2.5.2:compile
   - org.apache.hadoop:hadoop-hdfs:jar:2.5.2:compile
   - org.apache.hadoop:hadoop-mapreduce-client-app:jar:2.5.2:compile
   - org.apache.hadoop:hadoop-yarn-api:jar:2.5.2:compile
   - org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.5.2:compile
   -
   org.apache.hadoop:hadoop-mapreduce-client-jobclient:jar:2.5.2:compile
   - org.apache.hadoop:hadoop-annotations:jar:2.5.2:compile


- As stated above, both gora-shims-hadoop1 and gora-shims-hadoop2 depend
upon gora-shims-hadoop with the difference being that gora-shims-hadoop1
then defines hadoop 1.X dependencies.

 Problems - I understand that we have upgraded to Hadoop 2.5.2 by default.
 This is great. What I am failing to get a grasp on however is exactly how
 we provide guidance on upgrade to Gora 0.6 without upgrades from Hadoop
 1.2.X -- 2.5.X?

 Bearing in mind that gora-core depends upon gora-shims-hadoop therefore
 Hadoop 2.5.2 dependencies are automatically fetched in a transitive fashion
 whenever we with to upgrade gora-core dependency from 0.5 -- 0.6.

 I am going to experiment with using a bunch of exclusions in my pom.xml
 under the gora-shims-hadoop dependency e.g exclude all above Hadoop
 dependencies, then explicitly add the gora-shims-hadoop1 dependency.

 What is making this worse, is that I cannot create profiles for this
 upgrade as I would be able to do in a Maven project because I am working
 with Ant + Ivy.

 Any thoughts would be very much appreciated. Essentially whatever we
 discuss here is creation the foundation for the Gora Shims documentation so
 it would be very much appreciated.

 Thanks

 Lewis

 --
 *Lewis*


Re: Workings of Hadoop Shims

2015-02-22 Thread Henry Saputra
Thanks for starting the discussion, Lewis.

I am reviewing the changes and trying to unravel the dependencies and
figure out why the interface mismatch causing the stack error in Nutch
upgrade.

- Henry

On Sun, Feb 22, 2015 at 3:52 PM, Lewis John Mcgibbney
lewis.mcgibb...@gmail.com wrote:
 Hi Folks,
 I'm kicking off this overdue thread to obtain good understanding of exactly
 whats going on with the Hadoop Shims. The documentation is lacking at the
 moment and I am therefore putting time in to rectifying this.
 My humble beginnings are in progress below
 http://gora.apache.org/current/gora-shims.html

 Scenario - Upgrade Nutch 2.3.1-SNAPSHOT to Gora 0.6
 Jira Issue - https://issues.apache.org/jira/browse/NUTCH-1946
 Observations - From my initial analysis of the current state of the Shims,
 here are some initial observations

- gora-shims-distribution relies upon gora-shims-hadoop,
gora-shims-hadoop1 and gora-shims-hadoop2
- gora-shims-hadoop provides a parent for gora-shims-hadoop1 and
gora-shims-hadoop2, however it also had direct dependencies upon the
following
- org.apache.hadoop:hadoop-client:jar:2.5.2:compile
   - org.apache.hadoop:hadoop-hdfs:jar:2.5.2:compile
   - org.apache.hadoop:hadoop-mapreduce-client-app:jar:2.5.2:compile
   - org.apache.hadoop:hadoop-yarn-api:jar:2.5.2:compile
   - org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.5.2:compile
   -
   org.apache.hadoop:hadoop-mapreduce-client-jobclient:jar:2.5.2:compile
   - org.apache.hadoop:hadoop-annotations:jar:2.5.2:compile


- As stated above, both gora-shims-hadoop1 and gora-shims-hadoop2 depend
upon gora-shims-hadoop with the difference being that gora-shims-hadoop1
then defines hadoop 1.X dependencies.

 Problems - I understand that we have upgraded to Hadoop 2.5.2 by default.
 This is great. What I am failing to get a grasp on however is exactly how
 we provide guidance on upgrade to Gora 0.6 without upgrades from Hadoop
 1.2.X -- 2.5.X?

 Bearing in mind that gora-core depends upon gora-shims-hadoop therefore
 Hadoop 2.5.2 dependencies are automatically fetched in a transitive fashion
 whenever we with to upgrade gora-core dependency from 0.5 -- 0.6.

 I am going to experiment with using a bunch of exclusions in my pom.xml
 under the gora-shims-hadoop dependency e.g exclude all above Hadoop
 dependencies, then explicitly add the gora-shims-hadoop1 dependency.

 What is making this worse, is that I cannot create profiles for this
 upgrade as I would be able to do in a Maven project because I am working
 with Ant + Ivy.

 Any thoughts would be very much appreciated. Essentially whatever we
 discuss here is creation the foundation for the Gora Shims documentation so
 it would be very much appreciated.

 Thanks

 Lewis

 --
 *Lewis*