I did something similar to below to add the Cassandra dependencies. Note that I
am getting NoSuchMethodErrors not ClassNotFoundExceptions. Can you add the
hector jars to your nutch job jar and see what you get? I think I am one step
ahead of you. BTW, I just added this line to get the hector dependency:
<dependency org="me.prettyprint" name="hector-core" rev="0.8.0-2"
conf="*->default"/>
-----Original Message-----
From: Alexis [mailto:[email protected]]
Sent: Monday, August 01, 2011 2:28 PM
To: [email protected]
Subject: Re: Nutch 2 and Cassandra
Hi, libthrift is a dependency of cassandra-thrift, as listed here:
http://mvnrepository.com/artifact/org.apache.cassandra/cassandra-thrift/0.8.1
During Nutch build, you have to manually tweak the Ivy configuration depending
on your choice of the Gora store, in this case Cassandra.
Basically you need to add all the dependencies listed there:
http://svn.apache.org/viewvc/incubator/gora/trunk/gora-cassandra/ivy/ivy.xml?view=markup
Let's try to add to $NUTCH_HOME/ivy/ivy.xml the following dependencies and then
let's rebuild Nutch (see attached patch):
<dependency org="org.apache.gora" name="gora-cassandra"
rev="0.2-incubating" conf="*->compile"/>
<dependency org="org.apache.cassandra" name="cassandra-thrift"
rev="0.8.1"/>
<dependency org="com.ecyrd.speed4j" name="speed4j" rev="0.9"
conf="*->*,!javadoc,!sources"/>
<dependency org="com.github.stephenc.high-scale-lib"
name="high-scale-lib" rev="1.1.2" conf="*->*,!javadoc,!sources"/>
<dependency org="com.google.collections" name="google-collections"
rev="1.0" conf="*->*,!javadoc,!sources"/>
<dependency org="com.google.guava" name="guava" rev="r09"
conf="*->*,!javadoc,!sources"/>
$ ant clean
$ ant
In your case libthrift should now be downloaded by Ivy and then bundled into
the nutch-2.0-dev.job file. I'm not sure how apache-cassandra and hector got
included in your classpath...
Somehow we need to resolve as well:
<dependency org="org.apache.cassandra" name="apache-cassandra"
rev="0.8.1"/>
<dependency org="me.prettyprint" name="hector" rev="0.8.0-1"/>
I don't think the following 2 jars are in the default maven repository so they
won't be downloaded, that's why they were commented in the Gora Cassandra Ivy
config (gora/trunk/gora-cassandra/ivy/ivy.xml)
Since hector jar is not found in my case I get:
~/java/workspace/Nutch/trunk/runtime/deploy$ bin/nutch inject
~/java/workspace/Nutch/seeds
11/08/01 14:18:42 INFO crawl.InjectorJob: InjectorJob: starting
11/08/01 14:18:42 INFO crawl.InjectorJob: InjectorJob: urlDir:
/home/alex/java/workspace/Nutch/seeds
11/08/01 14:18:42 INFO security.Groups: Group mapping
impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping;
cacheTimeout=300000
11/08/01 14:18:42 INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=JobTracker, sessionId=
11/08/01 14:18:42 ERROR crawl.InjectorJob: InjectorJob:
org.apache.gora.util.GoraException:
java.lang.reflect.InvocationTargetException
at
org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:110)
at
org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:93)
at
org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:59)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:243)
at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:268)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:282)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:292)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:192)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at
org.apache.gora.util.ReflectionUtils.newInstance(ReflectionUtils.java:76)
at
org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:102)
... 12 more
Caused by: java.lang.NoClassDefFoundError: me/prettyprint/hector/api/Serializer
at
org.apache.gora.cassandra.store.CassandraStore.<init>(CassandraStore.java:60)
... 18 more
Caused by: java.lang.ClassNotFoundException:
me.prettyprint.hector.api.Serializer
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
... 19 more
On Mon, Aug 1, 2011 at 11:59 AM, Tom Davidson <[email protected]> wrote:
> Hi All,
>
>
>
> I am kind of at my wit's end here, so I am hoping someone here can
> help. I am trying to use Nutch2 and Cassandra and I have been
> successful using the runtime/local build. I am using the Cloudera CDH3
> on CentOs 5 and I do not want to contaminate by hadoop install by
> dropping in a bunch of Nutch jars, etc. So I am trying to use the
> nutch-2-dev.job jar. When I try to use the nutch2-dev.job jar, I get
> the error below. I have double and triple checked the classpath and
> the included jars and the only jar that contains FieldValueMetaData is
> the libthrift-0.6.1.jar which has the method that is claimed to be missing.
> Any ideas?
>
>
>
> Thanks,
>
> Tom
>
>
>
>
>
>
>
>
>
> [tdavidson@nadevsan06 ~]$ bin/nutch inject urls
>
> /opt/jdk1.6.0_21/bin/java -Dproc_jar -Xmx1000m
> -Dhadoop.log.dir=/usr/lib/hadoop-0.20/logs
> -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/lib/hadoop-0.20
> -Dhadoop.id.str=tdavidson -Dhadoop.root.logger=INFO,console
> -Djava.library.path=/usr/lib/hadoop-0.20/lib/native/Linux-amd64-64
> -Dhadoop.policy.file=hadoop-policy.xml -classpath
> /usr/lib/hadoop-0.20/conf:/opt/jdk1.6.0_21/lib/tools.jar:/usr/lib/hado
> op-0.20:/usr/lib/hadoop-0.20/hadoop-core-0.20.2-cdh3u1.jar:/usr/lib/ha
> doop-0.20/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20/lib/aspectjrt
> -1.6.5.jar:/usr/lib/hadoop-0.20/lib/aspectjtools-1.6.5.jar:/usr/lib/ha
> doop-0.20/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20/lib/commons-cod
> ec-1.4.jar:/usr/lib/hadoop-0.20/lib/commons-daemon-1.0.1.jar:/usr/lib/
> hadoop-0.20/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20/lib/commons-ht
> tpclient-3.0.1.jar:/usr/lib/hadoop-0.20/lib/commons-logging-1.0.4.jar:
> /usr/lib/hadoop-0.20/lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop
> -0.20/lib/commons-net-1.4.1.jar:/usr/lib/hadoop-0.20/lib/core-3.1.1.ja
> r:/usr/lib/hadoop-0.20/lib/hadoop-fairscheduler-0.20.2-cdh3u1.jar:/usr
> /lib/hadoop-0.20/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20/lib/hue-
> plugins-1.2.0-cdh3u1.jar:/usr/lib/hadoop-0.20/lib/jackson-core-asl-1.5
> .2.jar:/usr/lib/hadoop-0.20/lib/jackson-mapper-asl-1.5.2.jar:/usr/lib/
> hadoop-0.20/lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop-0.20/lib/ja
> sper-runtime-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jets3t-0.6.1.jar:/usr
> /lib/hadoop-0.20/lib/jetty-6.1.26.jar:/usr/lib/hadoop-0.20/lib/jetty-s
> ervlet-tester-6.1.26.jar:/usr/lib/hadoop-0.20/lib/jetty-util-6.1.26.ja
> r:/usr/lib/hadoop-0.20/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20/lib/ju
> nit-4.5.jar:/usr/lib/hadoop-0.20/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.2
> 0/lib/log4j-1.2.15.jar:/usr/lib/hadoop-0.20/lib/mockito-all-1.8.2.jar:
> /usr/lib/hadoop-0.20/lib/oro-2.0.8.jar:/usr/lib/hadoop-0.20/lib/servle
> t-api-2.5-20081211.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-6.1.14
> .jar:/usr/lib/hadoop-0.20/lib/slf4j-api-1.4.3.jar:/usr/lib/hadoop-0.20
> /lib/slf4j-log4j12-1.4.3.jar:/usr/lib/hadoop-0.20/lib/xmlenc-0.52.jar:
> /usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-2.1.jar:/usr/lib/hadoop-0.20/lib/
> jsp-2.1/jsp-api-2.1.jar org.apache.hadoop.util.RunJar
> /home/SEMDIRECTOR/tdavidson/nutch-2.job
> org.apache.nutch.crawl.InjectorJob urls
>
> 11/08/01 11:51:54 INFO crawl.InjectorJob: InjectorJob: starting
>
> 11/08/01 11:51:54 INFO crawl.InjectorJob: InjectorJob: urlDir: urls
>
> 11/08/01 11:51:55 INFO connection.CassandraHostRetryService: Downed
> Host Retry service started with queue size -1 and retry delay 10s
>
> 11/08/01 11:51:55 INFO service.JmxMonitor: Registering JMX
> me.prettyprint.cassandra.service_Test
> Cluster:ServiceType=hector,MonitorType=hector
>
> 11/08/01 11:51:55 ERROR crawl.InjectorJob: InjectorJob:
> org.apache.gora.util.GoraException:
> java.lang.reflect.InvocationTargetException
>
> at
> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactor
> y.java:110)
>
> at
> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactor
> y.java:93)
>
> at
> org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java
> :59)
>
> at
> org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:243)
>
> at
> org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:268)
>
> at
> org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:282)
>
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>
> at
> org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:292)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j
> ava:39)
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess
> orImpl.java:25)
>
> at java.lang.reflect.Method.invoke(Method.java:597)
>
> at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
>
> Caused by: java.lang.reflect.InvocationTargetException
>
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructo
> rAccessorImpl.java:39)
>
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingCo
> nstructorAccessorImpl.java:27)
>
> at
> java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>
> at
> org.apache.gora.util.ReflectionUtils.newInstance(ReflectionUtils.java:
> 76)
>
> at
> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactor
> y.java:102)
>
> ... 12 more
>
> Caused by: java.lang.NoSuchMethodError:
> org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V
>
> at org.apache.cassandra.thrift.CfDef.<clinit>(CfDef.java:299)
>
> at org.apache.cassandra.thrift.KsDef.read(KsDef.java:753)
>
> at
> org.apache.cassandra.thrift.Cassandra$describe_keyspace_result.read(Ca
> ssandra.java:24338)
>
> at
> org.apache.cassandra.thrift.Cassandra$Client.recv_describe_keyspace(Ca
> ssandra.java:1371)
>
> at
> org.apache.cassandra.thrift.Cassandra$Client.describe_keyspace(Cassand
> ra.java:1346)
>
> at
> me.prettyprint.cassandra.service.AbstractCluster$4.execute(AbstractClu
> ster.java:192)
>
> at
> me.prettyprint.cassandra.service.AbstractCluster$4.execute(AbstractClu
> ster.java:187)
>
> at
> me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operati
> on.java:101)
>
> at
> me.prettyprint.cassandra.connection.HConnectionManager.operateWithFail
> over(HConnectionManager.java:232)
>
> at
> me.prettyprint.cassandra.service.AbstractCluster.describeKeyspace(Abst
> ractCluster.java:201)
>
> at
> org.apache.gora.cassandra.store.CassandraClient.checkKeyspace(Cassandr
> aClient.java:82)
>
> at
> org.apache.gora.cassandra.store.CassandraClient.init(CassandraClient.j
> ava:69)
>
> at
> org.apache.gora.cassandra.store.CassandraStore.<init>(CassandraStore.j
> ava:68)
>
> ... 18 more