Re: Workings of Hadoop Shims
The gora-shims-distribution have optional dependencies on Hadoop-2 which should be ok. Lewis, could you try update gora-core/pom.xml to add optional to be true for the hadoop-client dependency: dependency groupIdorg.apache.hadoop/groupId artifactIdhadoop-client/artifactId optionaltrue/optional /dependency - Henry On Sun, Feb 22, 2015 at 3:52 PM, Lewis John Mcgibbney lewis.mcgibb...@gmail.com wrote: Hi Folks, I'm kicking off this overdue thread to obtain good understanding of exactly whats going on with the Hadoop Shims. The documentation is lacking at the moment and I am therefore putting time in to rectifying this. My humble beginnings are in progress below http://gora.apache.org/current/gora-shims.html Scenario - Upgrade Nutch 2.3.1-SNAPSHOT to Gora 0.6 Jira Issue - https://issues.apache.org/jira/browse/NUTCH-1946 Observations - From my initial analysis of the current state of the Shims, here are some initial observations - gora-shims-distribution relies upon gora-shims-hadoop, gora-shims-hadoop1 and gora-shims-hadoop2 - gora-shims-hadoop provides a parent for gora-shims-hadoop1 and gora-shims-hadoop2, however it also had direct dependencies upon the following - org.apache.hadoop:hadoop-client:jar:2.5.2:compile - org.apache.hadoop:hadoop-hdfs:jar:2.5.2:compile - org.apache.hadoop:hadoop-mapreduce-client-app:jar:2.5.2:compile - org.apache.hadoop:hadoop-yarn-api:jar:2.5.2:compile - org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.5.2:compile - org.apache.hadoop:hadoop-mapreduce-client-jobclient:jar:2.5.2:compile - org.apache.hadoop:hadoop-annotations:jar:2.5.2:compile - As stated above, both gora-shims-hadoop1 and gora-shims-hadoop2 depend upon gora-shims-hadoop with the difference being that gora-shims-hadoop1 then defines hadoop 1.X dependencies. Problems - I understand that we have upgraded to Hadoop 2.5.2 by default. This is great. What I am failing to get a grasp on however is exactly how we provide guidance on upgrade to Gora 0.6 without upgrades from Hadoop 1.2.X -- 2.5.X? Bearing in mind that gora-core depends upon gora-shims-hadoop therefore Hadoop 2.5.2 dependencies are automatically fetched in a transitive fashion whenever we with to upgrade gora-core dependency from 0.5 -- 0.6. I am going to experiment with using a bunch of exclusions in my pom.xml under the gora-shims-hadoop dependency e.g exclude all above Hadoop dependencies, then explicitly add the gora-shims-hadoop1 dependency. What is making this worse, is that I cannot create profiles for this upgrade as I would be able to do in a Maven project because I am working with Ant + Ivy. Any thoughts would be very much appreciated. Essentially whatever we discuss here is creation the foundation for the Gora Shims documentation so it would be very much appreciated. Thanks Lewis -- *Lewis*
Workings of Hadoop Shims
Hi Folks, I'm kicking off this overdue thread to obtain good understanding of exactly whats going on with the Hadoop Shims. The documentation is lacking at the moment and I am therefore putting time in to rectifying this. My humble beginnings are in progress below http://gora.apache.org/current/gora-shims.html Scenario - Upgrade Nutch 2.3.1-SNAPSHOT to Gora 0.6 Jira Issue - https://issues.apache.org/jira/browse/NUTCH-1946 Observations - From my initial analysis of the current state of the Shims, here are some initial observations - gora-shims-distribution relies upon gora-shims-hadoop, gora-shims-hadoop1 and gora-shims-hadoop2 - gora-shims-hadoop provides a parent for gora-shims-hadoop1 and gora-shims-hadoop2, however it also had direct dependencies upon the following - org.apache.hadoop:hadoop-client:jar:2.5.2:compile - org.apache.hadoop:hadoop-hdfs:jar:2.5.2:compile - org.apache.hadoop:hadoop-mapreduce-client-app:jar:2.5.2:compile - org.apache.hadoop:hadoop-yarn-api:jar:2.5.2:compile - org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.5.2:compile - org.apache.hadoop:hadoop-mapreduce-client-jobclient:jar:2.5.2:compile - org.apache.hadoop:hadoop-annotations:jar:2.5.2:compile - As stated above, both gora-shims-hadoop1 and gora-shims-hadoop2 depend upon gora-shims-hadoop with the difference being that gora-shims-hadoop1 then defines hadoop 1.X dependencies. Problems - I understand that we have upgraded to Hadoop 2.5.2 by default. This is great. What I am failing to get a grasp on however is exactly how we provide guidance on upgrade to Gora 0.6 without upgrades from Hadoop 1.2.X -- 2.5.X? Bearing in mind that gora-core depends upon gora-shims-hadoop therefore Hadoop 2.5.2 dependencies are automatically fetched in a transitive fashion whenever we with to upgrade gora-core dependency from 0.5 -- 0.6. I am going to experiment with using a bunch of exclusions in my pom.xml under the gora-shims-hadoop dependency e.g exclude all above Hadoop dependencies, then explicitly add the gora-shims-hadoop1 dependency. What is making this worse, is that I cannot create profiles for this upgrade as I would be able to do in a Maven project because I am working with Ant + Ivy. Any thoughts would be very much appreciated. Essentially whatever we discuss here is creation the foundation for the Gora Shims documentation so it would be very much appreciated. Thanks Lewis -- *Lewis*
Re: Workings of Hadoop Shims
Thanks for starting the discussion, Lewis. I am reviewing the changes and trying to unravel the dependencies and figure out why the interface mismatch causing the stack error in Nutch upgrade. - Henry On Sun, Feb 22, 2015 at 3:52 PM, Lewis John Mcgibbney lewis.mcgibb...@gmail.com wrote: Hi Folks, I'm kicking off this overdue thread to obtain good understanding of exactly whats going on with the Hadoop Shims. The documentation is lacking at the moment and I am therefore putting time in to rectifying this. My humble beginnings are in progress below http://gora.apache.org/current/gora-shims.html Scenario - Upgrade Nutch 2.3.1-SNAPSHOT to Gora 0.6 Jira Issue - https://issues.apache.org/jira/browse/NUTCH-1946 Observations - From my initial analysis of the current state of the Shims, here are some initial observations - gora-shims-distribution relies upon gora-shims-hadoop, gora-shims-hadoop1 and gora-shims-hadoop2 - gora-shims-hadoop provides a parent for gora-shims-hadoop1 and gora-shims-hadoop2, however it also had direct dependencies upon the following - org.apache.hadoop:hadoop-client:jar:2.5.2:compile - org.apache.hadoop:hadoop-hdfs:jar:2.5.2:compile - org.apache.hadoop:hadoop-mapreduce-client-app:jar:2.5.2:compile - org.apache.hadoop:hadoop-yarn-api:jar:2.5.2:compile - org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.5.2:compile - org.apache.hadoop:hadoop-mapreduce-client-jobclient:jar:2.5.2:compile - org.apache.hadoop:hadoop-annotations:jar:2.5.2:compile - As stated above, both gora-shims-hadoop1 and gora-shims-hadoop2 depend upon gora-shims-hadoop with the difference being that gora-shims-hadoop1 then defines hadoop 1.X dependencies. Problems - I understand that we have upgraded to Hadoop 2.5.2 by default. This is great. What I am failing to get a grasp on however is exactly how we provide guidance on upgrade to Gora 0.6 without upgrades from Hadoop 1.2.X -- 2.5.X? Bearing in mind that gora-core depends upon gora-shims-hadoop therefore Hadoop 2.5.2 dependencies are automatically fetched in a transitive fashion whenever we with to upgrade gora-core dependency from 0.5 -- 0.6. I am going to experiment with using a bunch of exclusions in my pom.xml under the gora-shims-hadoop dependency e.g exclude all above Hadoop dependencies, then explicitly add the gora-shims-hadoop1 dependency. What is making this worse, is that I cannot create profiles for this upgrade as I would be able to do in a Maven project because I am working with Ant + Ivy. Any thoughts would be very much appreciated. Essentially whatever we discuss here is creation the foundation for the Gora Shims documentation so it would be very much appreciated. Thanks Lewis -- *Lewis*