Hi, libthrift is a dependency of cassandra-thrift, as listed here: http://mvnrepository.com/artifact/org.apache.cassandra/cassandra-thrift/0.8.1
During Nutch build, you have to manually tweak the Ivy configuration depending on your choice of the Gora store, in this case Cassandra. Basically you need to add all the dependencies listed there: http://svn.apache.org/viewvc/incubator/gora/trunk/gora-cassandra/ivy/ivy.xml?view=markup Let's try to add to $NUTCH_HOME/ivy/ivy.xml the following dependencies and then let's rebuild Nutch (see attached patch): <dependency org="org.apache.gora" name="gora-cassandra" rev="0.2-incubating" conf="*->compile"/> <dependency org="org.apache.cassandra" name="cassandra-thrift" rev="0.8.1"/> <dependency org="com.ecyrd.speed4j" name="speed4j" rev="0.9" conf="*->*,!javadoc,!sources"/> <dependency org="com.github.stephenc.high-scale-lib" name="high-scale-lib" rev="1.1.2" conf="*->*,!javadoc,!sources"/> <dependency org="com.google.collections" name="google-collections" rev="1.0" conf="*->*,!javadoc,!sources"/> <dependency org="com.google.guava" name="guava" rev="r09" conf="*->*,!javadoc,!sources"/> $ ant clean $ ant In your case libthrift should now be downloaded by Ivy and then bundled into the nutch-2.0-dev.job file. I'm not sure how apache-cassandra and hector got included in your classpath... Somehow we need to resolve as well: <dependency org="org.apache.cassandra" name="apache-cassandra" rev="0.8.1"/> <dependency org="me.prettyprint" name="hector" rev="0.8.0-1"/> I don't think the following 2 jars are in the default maven repository so they won't be downloaded, that's why they were commented in the Gora Cassandra Ivy config (gora/trunk/gora-cassandra/ivy/ivy.xml) Since hector jar is not found in my case I get: ~/java/workspace/Nutch/trunk/runtime/deploy$ bin/nutch inject ~/java/workspace/Nutch/seeds 11/08/01 14:18:42 INFO crawl.InjectorJob: InjectorJob: starting 11/08/01 14:18:42 INFO crawl.InjectorJob: InjectorJob: urlDir: /home/alex/java/workspace/Nutch/seeds 11/08/01 14:18:42 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000 11/08/01 14:18:42 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 11/08/01 14:18:42 ERROR crawl.InjectorJob: InjectorJob: org.apache.gora.util.GoraException: java.lang.reflect.InvocationTargetException at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:110) at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:93) at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:59) at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:243) at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:268) at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:282) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69) at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:292) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:192) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.gora.util.ReflectionUtils.newInstance(ReflectionUtils.java:76) at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:102) ... 12 more Caused by: java.lang.NoClassDefFoundError: me/prettyprint/hector/api/Serializer at org.apache.gora.cassandra.store.CassandraStore.<init>(CassandraStore.java:60) ... 18 more Caused by: java.lang.ClassNotFoundException: me.prettyprint.hector.api.Serializer at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) ... 19 more On Mon, Aug 1, 2011 at 11:59 AM, Tom Davidson <tdavid...@covario.com> wrote: > Hi All, > > > > I am kind of at my wit’s end here, so I am hoping someone here can help. I > am trying to use Nutch2 and Cassandra and I have been successful using the > runtime/local build. I am using the Cloudera CDH3 on CentOs 5 and I do not > want to contaminate by hadoop install by dropping in a bunch of Nutch jars, > etc. So I am trying to use the nutch-2-dev.job jar. When I try to use the > nutch2-dev.job jar, I get the error below. I have double and triple checked > the classpath and the included jars and the only jar that contains > FieldValueMetaData is the libthrift-0.6.1.jar which has the method that is > claimed to be missing. Any ideas? > > > > Thanks, > > Tom > > > > > > > > > > [tdavidson@nadevsan06 ~]$ bin/nutch inject urls > > /opt/jdk1.6.0_21/bin/java -Dproc_jar -Xmx1000m > -Dhadoop.log.dir=/usr/lib/hadoop-0.20/logs -Dhadoop.log.file=hadoop.log > -Dhadoop.home.dir=/usr/lib/hadoop-0.20 -Dhadoop.id.str=tdavidson > -Dhadoop.root.logger=INFO,console > -Djava.library.path=/usr/lib/hadoop-0.20/lib/native/Linux-amd64-64 > -Dhadoop.policy.file=hadoop-policy.xml -classpath > /usr/lib/hadoop-0.20/conf:/opt/jdk1.6.0_21/lib/tools.jar:/usr/lib/hadoop-0.20:/usr/lib/hadoop-0.20/hadoop-core-0.20.2-cdh3u1.jar:/usr/lib/hadoop-0.20/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20/lib/commons-daemon-1.0.1.jar:/usr/lib/hadoop-0.20/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20/lib/commons-httpclient-3.0.1.jar:/usr/lib/hadoop-0.20/lib/commons-logging-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-net-1.4.1.jar:/usr/lib/hadoop-0.20/lib/core-3.1.1.jar:/usr/lib/hadoop-0.20/lib/hadoop-fairscheduler-0.20.2-cdh3u1.jar:/usr/lib/hadoop-0.20/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20/lib/hue-plugins-1.2.0-cdh3u1.jar:/usr/lib/hadoop-0.20/lib/jackson-core-asl-1.5.2.jar:/usr/lib/hadoop-0.20/lib/jackson-mapper-asl-1.5.2.jar:/usr/lib/hadoop-0.20/lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jasper-runtime-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20/lib/jetty-6.1.26.jar:/usr/lib/hadoop-0.20/lib/jetty-servlet-tester-6.1.26.jar:/usr/lib/hadoop-0.20/lib/jetty-util-6.1.26.jar:/usr/lib/hadoop-0.20/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20/lib/junit-4.5.jar:/usr/lib/hadoop-0.20/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20/lib/log4j-1.2.15.jar:/usr/lib/hadoop-0.20/lib/mockito-all-1.8.2.jar:/usr/lib/hadoop-0.20/lib/oro-2.0.8.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-20081211.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-6.1.14.jar:/usr/lib/hadoop-0.20/lib/slf4j-api-1.4.3.jar:/usr/lib/hadoop-0.20/lib/slf4j-log4j12-1.4.3.jar:/usr/lib/hadoop-0.20/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-2.1.jar:/usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-api-2.1.jar > org.apache.hadoop.util.RunJar /home/SEMDIRECTOR/tdavidson/nutch-2.job > org.apache.nutch.crawl.InjectorJob urls > > 11/08/01 11:51:54 INFO crawl.InjectorJob: InjectorJob: starting > > 11/08/01 11:51:54 INFO crawl.InjectorJob: InjectorJob: urlDir: urls > > 11/08/01 11:51:55 INFO connection.CassandraHostRetryService: Downed Host > Retry service started with queue size -1 and retry delay 10s > > 11/08/01 11:51:55 INFO service.JmxMonitor: Registering JMX > me.prettyprint.cassandra.service_Test > Cluster:ServiceType=hector,MonitorType=hector > > 11/08/01 11:51:55 ERROR crawl.InjectorJob: InjectorJob: > org.apache.gora.util.GoraException: > java.lang.reflect.InvocationTargetException > > at > org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:110) > > at > org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:93) > > at > org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:59) > > at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:243) > > at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:268) > > at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:282) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > > at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:292) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at org.apache.hadoop.util.RunJar.main(RunJar.java:186) > > Caused by: java.lang.reflect.InvocationTargetException > > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > > at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > > at > org.apache.gora.util.ReflectionUtils.newInstance(ReflectionUtils.java:76) > > at > org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:102) > > ... 12 more > > Caused by: java.lang.NoSuchMethodError: > org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V > > at org.apache.cassandra.thrift.CfDef.<clinit>(CfDef.java:299) > > at org.apache.cassandra.thrift.KsDef.read(KsDef.java:753) > > at > org.apache.cassandra.thrift.Cassandra$describe_keyspace_result.read(Cassandra.java:24338) > > at > org.apache.cassandra.thrift.Cassandra$Client.recv_describe_keyspace(Cassandra.java:1371) > > at > org.apache.cassandra.thrift.Cassandra$Client.describe_keyspace(Cassandra.java:1346) > > at > me.prettyprint.cassandra.service.AbstractCluster$4.execute(AbstractCluster.java:192) > > at > me.prettyprint.cassandra.service.AbstractCluster$4.execute(AbstractCluster.java:187) > > at > me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:101) > > at > me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:232) > > at > me.prettyprint.cassandra.service.AbstractCluster.describeKeyspace(AbstractCluster.java:201) > > at > org.apache.gora.cassandra.store.CassandraClient.checkKeyspace(CassandraClient.java:82) > > at > org.apache.gora.cassandra.store.CassandraClient.init(CassandraClient.java:69) > > at > org.apache.gora.cassandra.store.CassandraStore.<init>(CassandraStore.java:68) > > ... 18 more
Index: ivy/ivy.xml =================================================================== --- ivy/ivy.xml (revision 1145734) +++ ivy/ivy.xml (working copy) @@ -32,7 +32,7 @@ <dependencies> <dependency org="org.apache.solr" name="solr-solrj" rev="3.1.0" conf="*->default" /> - <dependency org="org.slf4j" name="slf4j-log4j12" rev="1.5.5" conf="*->master" /> + <dependency org="org.slf4j" name="slf4j-log4j12" rev="1.6.1" conf="*->master" /> <dependency org="commons-lang" name="commons-lang" rev="2.4" conf="*->default" /> @@ -93,18 +93,13 @@ <dependency org="org.hsqldb" name="hsqldb" rev="2.0.0" conf="*->default"/> <dependency org="org.jdom" name="jdom" rev="1.1" conf="test->default"/> - <dependency org="org.apache.gora" name="gora-sql" rev="0.2-incubating" conf="*->compile"/> +<!-- + <dependency org="org.apache.gora" name="gora-sql" rev="0.2-incubating" conf="*->compile"/> +--> <dependency org="org.restlet.jse" name="org.restlet" rev="2.0.5" conf="*->default"/> <dependency org="org.restlet.jse" name="org.restlet.ext.jackson" rev="2.0.5" conf="*->default"/> <!-- - Uncomment this to use MySQL as database with SQL as Gora store. ---> -<!-- - <dependency org="mysql" name="mysql-connector-java" rev="5.1.13" conf="*->default"/> ---> - -<!-- Uncomment this to use HBase as Gora backend. Then manually add hbase-0.20.6 jar to the lib directory. --> <!-- @@ -114,6 +109,14 @@ </dependency> --> + <dependency org="org.apache.gora" name="gora-cassandra" rev="0.2-incubating" conf="*->compile"/> + <dependency org="org.apache.cassandra" name="cassandra-thrift" rev="0.8.1"/> + <dependency org="com.ecyrd.speed4j" name="speed4j" rev="0.9" conf="*->*,!javadoc,!sources"/> + <dependency org="com.github.stephenc.high-scale-lib" name="high-scale-lib" rev="1.1.2" conf="*->*,!javadoc,!sources"/> + <dependency org="com.google.collections" name="google-collections" rev="1.0" conf="*->*,!javadoc,!sources"/> + <dependency org="com.google.guava" name="guava" rev="r09" conf="*->*,!javadoc,!sources"/> + + <!--global exclusion--> <exclude module="ant" /> <exclude module="jmxtools" />