Re: Workings of Hadoop Shims

2015-02-24 Thread Henry Saputra
The gora-shims-distribution have optional dependencies on Hadoop-2
which should be ok.

Lewis, could you try update gora-core/pom.xml to add optional to be
true for the hadoop-client dependency:

dependency
  groupIdorg.apache.hadoop/groupId
  artifactIdhadoop-client/artifactId
  optionaltrue/optional
/dependency

- Henry


On Sun, Feb 22, 2015 at 3:52 PM, Lewis John Mcgibbney
lewis.mcgibb...@gmail.com wrote:
 Hi Folks,
 I'm kicking off this overdue thread to obtain good understanding of exactly
 whats going on with the Hadoop Shims. The documentation is lacking at the
 moment and I am therefore putting time in to rectifying this.
 My humble beginnings are in progress below
 http://gora.apache.org/current/gora-shims.html

 Scenario - Upgrade Nutch 2.3.1-SNAPSHOT to Gora 0.6
 Jira Issue - https://issues.apache.org/jira/browse/NUTCH-1946
 Observations - From my initial analysis of the current state of the Shims,
 here are some initial observations

- gora-shims-distribution relies upon gora-shims-hadoop,
gora-shims-hadoop1 and gora-shims-hadoop2
- gora-shims-hadoop provides a parent for gora-shims-hadoop1 and
gora-shims-hadoop2, however it also had direct dependencies upon the
following
- org.apache.hadoop:hadoop-client:jar:2.5.2:compile
   - org.apache.hadoop:hadoop-hdfs:jar:2.5.2:compile
   - org.apache.hadoop:hadoop-mapreduce-client-app:jar:2.5.2:compile
   - org.apache.hadoop:hadoop-yarn-api:jar:2.5.2:compile
   - org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.5.2:compile
   -
   org.apache.hadoop:hadoop-mapreduce-client-jobclient:jar:2.5.2:compile
   - org.apache.hadoop:hadoop-annotations:jar:2.5.2:compile


- As stated above, both gora-shims-hadoop1 and gora-shims-hadoop2 depend
upon gora-shims-hadoop with the difference being that gora-shims-hadoop1
then defines hadoop 1.X dependencies.

 Problems - I understand that we have upgraded to Hadoop 2.5.2 by default.
 This is great. What I am failing to get a grasp on however is exactly how
 we provide guidance on upgrade to Gora 0.6 without upgrades from Hadoop
 1.2.X -- 2.5.X?

 Bearing in mind that gora-core depends upon gora-shims-hadoop therefore
 Hadoop 2.5.2 dependencies are automatically fetched in a transitive fashion
 whenever we with to upgrade gora-core dependency from 0.5 -- 0.6.

 I am going to experiment with using a bunch of exclusions in my pom.xml
 under the gora-shims-hadoop dependency e.g exclude all above Hadoop
 dependencies, then explicitly add the gora-shims-hadoop1 dependency.

 What is making this worse, is that I cannot create profiles for this
 upgrade as I would be able to do in a Maven project because I am working
 with Ant + Ivy.

 Any thoughts would be very much appreciated. Essentially whatever we
 discuss here is creation the foundation for the Gora Shims documentation so
 it would be very much appreciated.

 Thanks

 Lewis

 --
 *Lewis*


Workings of Hadoop Shims

2015-02-22 Thread Lewis John Mcgibbney
Hi Folks,
I'm kicking off this overdue thread to obtain good understanding of exactly
whats going on with the Hadoop Shims. The documentation is lacking at the
moment and I am therefore putting time in to rectifying this.
My humble beginnings are in progress below
http://gora.apache.org/current/gora-shims.html

Scenario - Upgrade Nutch 2.3.1-SNAPSHOT to Gora 0.6
Jira Issue - https://issues.apache.org/jira/browse/NUTCH-1946
Observations - From my initial analysis of the current state of the Shims,
here are some initial observations

   - gora-shims-distribution relies upon gora-shims-hadoop,
   gora-shims-hadoop1 and gora-shims-hadoop2
   - gora-shims-hadoop provides a parent for gora-shims-hadoop1 and
   gora-shims-hadoop2, however it also had direct dependencies upon the
   following
   - org.apache.hadoop:hadoop-client:jar:2.5.2:compile
  - org.apache.hadoop:hadoop-hdfs:jar:2.5.2:compile
  - org.apache.hadoop:hadoop-mapreduce-client-app:jar:2.5.2:compile
  - org.apache.hadoop:hadoop-yarn-api:jar:2.5.2:compile
  - org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.5.2:compile
  -
  org.apache.hadoop:hadoop-mapreduce-client-jobclient:jar:2.5.2:compile
  - org.apache.hadoop:hadoop-annotations:jar:2.5.2:compile


   - As stated above, both gora-shims-hadoop1 and gora-shims-hadoop2 depend
   upon gora-shims-hadoop with the difference being that gora-shims-hadoop1
   then defines hadoop 1.X dependencies.

Problems - I understand that we have upgraded to Hadoop 2.5.2 by default.
This is great. What I am failing to get a grasp on however is exactly how
we provide guidance on upgrade to Gora 0.6 without upgrades from Hadoop
1.2.X -- 2.5.X?

Bearing in mind that gora-core depends upon gora-shims-hadoop therefore
Hadoop 2.5.2 dependencies are automatically fetched in a transitive fashion
whenever we with to upgrade gora-core dependency from 0.5 -- 0.6.

I am going to experiment with using a bunch of exclusions in my pom.xml
under the gora-shims-hadoop dependency e.g exclude all above Hadoop
dependencies, then explicitly add the gora-shims-hadoop1 dependency.

What is making this worse, is that I cannot create profiles for this
upgrade as I would be able to do in a Maven project because I am working
with Ant + Ivy.

Any thoughts would be very much appreciated. Essentially whatever we
discuss here is creation the foundation for the Gora Shims documentation so
it would be very much appreciated.

Thanks

Lewis

-- 
*Lewis*


Re: Workings of Hadoop Shims

2015-02-22 Thread Henry Saputra
Thanks for starting the discussion, Lewis.

I am reviewing the changes and trying to unravel the dependencies and
figure out why the interface mismatch causing the stack error in Nutch
upgrade.

- Henry

On Sun, Feb 22, 2015 at 3:52 PM, Lewis John Mcgibbney
lewis.mcgibb...@gmail.com wrote:
 Hi Folks,
 I'm kicking off this overdue thread to obtain good understanding of exactly
 whats going on with the Hadoop Shims. The documentation is lacking at the
 moment and I am therefore putting time in to rectifying this.
 My humble beginnings are in progress below
 http://gora.apache.org/current/gora-shims.html

 Scenario - Upgrade Nutch 2.3.1-SNAPSHOT to Gora 0.6
 Jira Issue - https://issues.apache.org/jira/browse/NUTCH-1946
 Observations - From my initial analysis of the current state of the Shims,
 here are some initial observations

- gora-shims-distribution relies upon gora-shims-hadoop,
gora-shims-hadoop1 and gora-shims-hadoop2
- gora-shims-hadoop provides a parent for gora-shims-hadoop1 and
gora-shims-hadoop2, however it also had direct dependencies upon the
following
- org.apache.hadoop:hadoop-client:jar:2.5.2:compile
   - org.apache.hadoop:hadoop-hdfs:jar:2.5.2:compile
   - org.apache.hadoop:hadoop-mapreduce-client-app:jar:2.5.2:compile
   - org.apache.hadoop:hadoop-yarn-api:jar:2.5.2:compile
   - org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.5.2:compile
   -
   org.apache.hadoop:hadoop-mapreduce-client-jobclient:jar:2.5.2:compile
   - org.apache.hadoop:hadoop-annotations:jar:2.5.2:compile


- As stated above, both gora-shims-hadoop1 and gora-shims-hadoop2 depend
upon gora-shims-hadoop with the difference being that gora-shims-hadoop1
then defines hadoop 1.X dependencies.

 Problems - I understand that we have upgraded to Hadoop 2.5.2 by default.
 This is great. What I am failing to get a grasp on however is exactly how
 we provide guidance on upgrade to Gora 0.6 without upgrades from Hadoop
 1.2.X -- 2.5.X?

 Bearing in mind that gora-core depends upon gora-shims-hadoop therefore
 Hadoop 2.5.2 dependencies are automatically fetched in a transitive fashion
 whenever we with to upgrade gora-core dependency from 0.5 -- 0.6.

 I am going to experiment with using a bunch of exclusions in my pom.xml
 under the gora-shims-hadoop dependency e.g exclude all above Hadoop
 dependencies, then explicitly add the gora-shims-hadoop1 dependency.

 What is making this worse, is that I cannot create profiles for this
 upgrade as I would be able to do in a Maven project because I am working
 with Ant + Ivy.

 Any thoughts would be very much appreciated. Essentially whatever we
 discuss here is creation the foundation for the Gora Shims documentation so
 it would be very much appreciated.

 Thanks

 Lewis

 --
 *Lewis*