I did something similar to below to add the Cassandra dependencies. Note that I 
am getting NoSuchMethodErrors not ClassNotFoundExceptions. Can you add the 
hector jars to your nutch job jar and see what you get? I think I am one step 
ahead of you. BTW, I just added this line to get the hector dependency:

        <dependency org="me.prettyprint" name="hector-core" rev="0.8.0-2" 
conf="*->default"/>

-----Original Message-----
From: Alexis [mailto:[email protected]] 
Sent: Monday, August 01, 2011 2:28 PM
To: [email protected]
Subject: Re: Nutch 2 and Cassandra

Hi, libthrift is a dependency of cassandra-thrift, as listed here:
http://mvnrepository.com/artifact/org.apache.cassandra/cassandra-thrift/0.8.1

During Nutch build, you have to manually tweak the Ivy configuration depending 
on your choice of the Gora store, in this case Cassandra.
Basically you need to add all the dependencies listed there:
http://svn.apache.org/viewvc/incubator/gora/trunk/gora-cassandra/ivy/ivy.xml?view=markup

Let's try to add to $NUTCH_HOME/ivy/ivy.xml the following dependencies and then 
let's rebuild Nutch (see attached patch):
        <dependency org="org.apache.gora" name="gora-cassandra"
rev="0.2-incubating" conf="*->compile"/>
        <dependency org="org.apache.cassandra" name="cassandra-thrift" 
rev="0.8.1"/>
        <dependency org="com.ecyrd.speed4j" name="speed4j" rev="0.9"
conf="*->*,!javadoc,!sources"/>
        <dependency org="com.github.stephenc.high-scale-lib"
name="high-scale-lib" rev="1.1.2" conf="*->*,!javadoc,!sources"/>
        <dependency org="com.google.collections" name="google-collections"
rev="1.0" conf="*->*,!javadoc,!sources"/>
        <dependency org="com.google.guava" name="guava" rev="r09"
conf="*->*,!javadoc,!sources"/>

$ ant clean
$ ant

In your case libthrift should now be downloaded by Ivy and then bundled into 
the nutch-2.0-dev.job file. I'm not sure how apache-cassandra and hector got 
included in your classpath...

Somehow we need to resolve as well:
        <dependency org="org.apache.cassandra" name="apache-cassandra"
rev="0.8.1"/>
        <dependency org="me.prettyprint" name="hector" rev="0.8.0-1"/>

I don't think the following 2 jars are in the default maven repository so they 
won't be downloaded, that's why they were commented in the Gora Cassandra Ivy 
config (gora/trunk/gora-cassandra/ivy/ivy.xml)


Since hector jar is not found in my case I get:
~/java/workspace/Nutch/trunk/runtime/deploy$ bin/nutch inject 
~/java/workspace/Nutch/seeds
11/08/01 14:18:42 INFO crawl.InjectorJob: InjectorJob: starting
11/08/01 14:18:42 INFO crawl.InjectorJob: InjectorJob: urlDir:
/home/alex/java/workspace/Nutch/seeds
11/08/01 14:18:42 INFO security.Groups: Group mapping 
impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping;
cacheTimeout=300000
11/08/01 14:18:42 INFO jvm.JvmMetrics: Initializing JVM Metrics with 
processName=JobTracker, sessionId=
11/08/01 14:18:42 ERROR crawl.InjectorJob: InjectorJob:
org.apache.gora.util.GoraException:
java.lang.reflect.InvocationTargetException
        at 
org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:110)
        at 
org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:93)
        at 
org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:59)
        at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:243)
        at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:268)
        at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:282)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
        at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:292)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:192)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at 
org.apache.gora.util.ReflectionUtils.newInstance(ReflectionUtils.java:76)
        at 
org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:102)
        ... 12 more
Caused by: java.lang.NoClassDefFoundError: me/prettyprint/hector/api/Serializer
        at 
org.apache.gora.cassandra.store.CassandraStore.<init>(CassandraStore.java:60)
        ... 18 more
Caused by: java.lang.ClassNotFoundException:
me.prettyprint.hector.api.Serializer
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
        ... 19 more




On Mon, Aug 1, 2011 at 11:59 AM, Tom Davidson <[email protected]> wrote:
> Hi All,
>
>
>
> I am kind of at my wit's end here, so I am hoping someone here can 
> help.  I am trying to use Nutch2 and Cassandra and I have been 
> successful using the runtime/local build. I am using the Cloudera CDH3 
> on CentOs 5 and I do not want to contaminate by hadoop install by 
> dropping in a bunch of Nutch jars, etc. So I am trying to use the 
> nutch-2-dev.job jar. When I try to use the nutch2-dev.job jar, I get 
> the error below.  I have double and triple checked the classpath and 
> the included jars and the only jar that contains FieldValueMetaData is 
> the libthrift-0.6.1.jar which has the method that is claimed to be missing. 
> Any ideas?
>
>
>
> Thanks,
>
> Tom
>
>
>
>
>
>
>
>
>
> [tdavidson@nadevsan06 ~]$ bin/nutch inject urls
>
> /opt/jdk1.6.0_21/bin/java -Dproc_jar -Xmx1000m 
> -Dhadoop.log.dir=/usr/lib/hadoop-0.20/logs 
> -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/lib/hadoop-0.20 
> -Dhadoop.id.str=tdavidson -Dhadoop.root.logger=INFO,console
> -Djava.library.path=/usr/lib/hadoop-0.20/lib/native/Linux-amd64-64
> -Dhadoop.policy.file=hadoop-policy.xml -classpath 
> /usr/lib/hadoop-0.20/conf:/opt/jdk1.6.0_21/lib/tools.jar:/usr/lib/hado
> op-0.20:/usr/lib/hadoop-0.20/hadoop-core-0.20.2-cdh3u1.jar:/usr/lib/ha
> doop-0.20/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20/lib/aspectjrt
> -1.6.5.jar:/usr/lib/hadoop-0.20/lib/aspectjtools-1.6.5.jar:/usr/lib/ha
> doop-0.20/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20/lib/commons-cod
> ec-1.4.jar:/usr/lib/hadoop-0.20/lib/commons-daemon-1.0.1.jar:/usr/lib/
> hadoop-0.20/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20/lib/commons-ht
> tpclient-3.0.1.jar:/usr/lib/hadoop-0.20/lib/commons-logging-1.0.4.jar:
> /usr/lib/hadoop-0.20/lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop
> -0.20/lib/commons-net-1.4.1.jar:/usr/lib/hadoop-0.20/lib/core-3.1.1.ja
> r:/usr/lib/hadoop-0.20/lib/hadoop-fairscheduler-0.20.2-cdh3u1.jar:/usr
> /lib/hadoop-0.20/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20/lib/hue-
> plugins-1.2.0-cdh3u1.jar:/usr/lib/hadoop-0.20/lib/jackson-core-asl-1.5
> .2.jar:/usr/lib/hadoop-0.20/lib/jackson-mapper-asl-1.5.2.jar:/usr/lib/
> hadoop-0.20/lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop-0.20/lib/ja
> sper-runtime-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jets3t-0.6.1.jar:/usr
> /lib/hadoop-0.20/lib/jetty-6.1.26.jar:/usr/lib/hadoop-0.20/lib/jetty-s
> ervlet-tester-6.1.26.jar:/usr/lib/hadoop-0.20/lib/jetty-util-6.1.26.ja
> r:/usr/lib/hadoop-0.20/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20/lib/ju
> nit-4.5.jar:/usr/lib/hadoop-0.20/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.2
> 0/lib/log4j-1.2.15.jar:/usr/lib/hadoop-0.20/lib/mockito-all-1.8.2.jar:
> /usr/lib/hadoop-0.20/lib/oro-2.0.8.jar:/usr/lib/hadoop-0.20/lib/servle
> t-api-2.5-20081211.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-6.1.14
> .jar:/usr/lib/hadoop-0.20/lib/slf4j-api-1.4.3.jar:/usr/lib/hadoop-0.20
> /lib/slf4j-log4j12-1.4.3.jar:/usr/lib/hadoop-0.20/lib/xmlenc-0.52.jar:
> /usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-2.1.jar:/usr/lib/hadoop-0.20/lib/
> jsp-2.1/jsp-api-2.1.jar org.apache.hadoop.util.RunJar 
> /home/SEMDIRECTOR/tdavidson/nutch-2.job
> org.apache.nutch.crawl.InjectorJob urls
>
> 11/08/01 11:51:54 INFO crawl.InjectorJob: InjectorJob: starting
>
> 11/08/01 11:51:54 INFO crawl.InjectorJob: InjectorJob: urlDir: urls
>
> 11/08/01 11:51:55 INFO connection.CassandraHostRetryService: Downed 
> Host Retry service started with queue size -1 and retry delay 10s
>
> 11/08/01 11:51:55 INFO service.JmxMonitor: Registering JMX 
> me.prettyprint.cassandra.service_Test
> Cluster:ServiceType=hector,MonitorType=hector
>
> 11/08/01 11:51:55 ERROR crawl.InjectorJob: InjectorJob:
> org.apache.gora.util.GoraException:
> java.lang.reflect.InvocationTargetException
>
>         at
> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactor
> y.java:110)
>
>         at
> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactor
> y.java:93)
>
>         at
> org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java
> :59)
>
>         at 
> org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:243)
>
>         at 
> org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:268)
>
>         at 
> org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:282)
>
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>
>         at 
> org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:292)
>
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j
> ava:39)
>
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess
> orImpl.java:25)
>
>         at java.lang.reflect.Method.invoke(Method.java:597)
>
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
>
> Caused by: java.lang.reflect.InvocationTargetException
>
>         at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>
>         at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructo
> rAccessorImpl.java:39)
>
>         at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingCo
> nstructorAccessorImpl.java:27)
>
>         at 
> java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>
>         at
> org.apache.gora.util.ReflectionUtils.newInstance(ReflectionUtils.java:
> 76)
>
>         at
> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactor
> y.java:102)
>
>         ... 12 more
>
> Caused by: java.lang.NoSuchMethodError:
> org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V
>
>         at org.apache.cassandra.thrift.CfDef.<clinit>(CfDef.java:299)
>
>         at org.apache.cassandra.thrift.KsDef.read(KsDef.java:753)
>
>         at
> org.apache.cassandra.thrift.Cassandra$describe_keyspace_result.read(Ca
> ssandra.java:24338)
>
>         at
> org.apache.cassandra.thrift.Cassandra$Client.recv_describe_keyspace(Ca
> ssandra.java:1371)
>
>         at
> org.apache.cassandra.thrift.Cassandra$Client.describe_keyspace(Cassand
> ra.java:1346)
>
>         at
> me.prettyprint.cassandra.service.AbstractCluster$4.execute(AbstractClu
> ster.java:192)
>
>         at
> me.prettyprint.cassandra.service.AbstractCluster$4.execute(AbstractClu
> ster.java:187)
>
>         at
> me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operati
> on.java:101)
>
>         at
> me.prettyprint.cassandra.connection.HConnectionManager.operateWithFail
> over(HConnectionManager.java:232)
>
>         at
> me.prettyprint.cassandra.service.AbstractCluster.describeKeyspace(Abst
> ractCluster.java:201)
>
>         at
> org.apache.gora.cassandra.store.CassandraClient.checkKeyspace(Cassandr
> aClient.java:82)
>
>         at
> org.apache.gora.cassandra.store.CassandraClient.init(CassandraClient.j
> ava:69)
>
>         at
> org.apache.gora.cassandra.store.CassandraStore.<init>(CassandraStore.j
> ava:68)
>
>         ... 18 more

Reply via email to