Ok this version of hector was properly resolved. Thanks!

These are the logs:
~/java/workspace/Nutch/trunk/runtime/deploy$ bin/nutch inject
~/java/workspace/Nutch/seeds
11/08/01 15:17:45 INFO crawl.InjectorJob: InjectorJob: starting
11/08/01 15:17:45 INFO crawl.InjectorJob: InjectorJob: urlDir:
/home/alex/java/workspace/Nutch/seeds
11/08/01 15:17:45 INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=JobTracker, sessionId=
11/08/01 15:17:46 INFO connection.CassandraHostRetryService: Downed
Host Retry service started with queue size -1 and retry delay 10s
11/08/01 15:17:46 INFO service.JmxMonitor: Registering JMX
me.prettyprint.cassandra.service_Test
Cluster:ServiceType=hector,MonitorType=hector
11/08/01 15:17:47 INFO store.CassandraClient: Keyspace 'webpage' in
cluster 'Test Cluster' was created on host 'localhost'
11/08/01 15:17:48 INFO input.FileInputFormat: Total input paths to process : 1
11/08/01 15:17:49 INFO mapred.JobClient: Running job: job_local_0001
11/08/01 15:17:49 INFO input.FileInputFormat: Total input paths to process : 1
11/08/01 15:17:49 INFO mapreduce.GoraRecordWriter:
gora.buffer.write.limit = 10000
11/08/01 15:17:49 INFO plugin.PluginRepository: Plugins: looking in:
/tmp/hadoop-alex/hadoop-unjar8045717865743865180/plugins
11/08/01 15:17:49 INFO plugin.PluginRepository: Plugin Auto-activation
mode: [true]
11/08/01 15:17:49 INFO plugin.PluginRepository: Registered Plugins:
11/08/01 15:17:49 INFO plugin.PluginRepository:         the nutch core
extension points (nutch-extensionpoints)
11/08/01 15:17:49 INFO plugin.PluginRepository:         Basic URL
Normalizer (urlnormalizer-basic)
11/08/01 15:17:49 INFO plugin.PluginRepository:         Basic Indexing
Filter (index-basic)
11/08/01 15:17:49 INFO plugin.PluginRepository:         Html Parse
Plug-in (parse-html)
11/08/01 15:17:49 INFO plugin.PluginRepository:         HTTP Framework
(lib-http)
11/08/01 15:17:49 INFO plugin.PluginRepository:         Pass-through
URL Normalizer (urlnormalizer-pass)
11/08/01 15:17:49 INFO plugin.PluginRepository:         Regex URL
Filter (urlfilter-regex)
11/08/01 15:17:49 INFO plugin.PluginRepository:         Http Protocol
Plug-in (protocol-http)
11/08/01 15:17:49 INFO plugin.PluginRepository:         Regex URL
Normalizer (urlnormalizer-regex)
11/08/01 15:17:49 INFO plugin.PluginRepository:         Tika Parser
Plug-in (parse-tika)
11/08/01 15:17:49 INFO plugin.PluginRepository:         OPIC Scoring
Plug-in (scoring-opic)
11/08/01 15:17:49 INFO plugin.PluginRepository:         CyberNeko HTML
Parser (lib-nekohtml)
11/08/01 15:17:49 INFO plugin.PluginRepository:         Anchor
Indexing Filter (index-anchor)
11/08/01 15:17:49 INFO plugin.PluginRepository:         Regex URL
Filter Framework (lib-regex-filter)
11/08/01 15:17:49 INFO plugin.PluginRepository: Registered Extension-Points:
11/08/01 15:17:49 INFO plugin.PluginRepository:         Nutch URL
Normalizer (org.apache.nutch.net.URLNormalizer)
11/08/01 15:17:49 INFO plugin.PluginRepository:         Nutch Protocol
(org.apache.nutch.protocol.Protocol)
11/08/01 15:17:49 INFO plugin.PluginRepository:         Parse Filter
(org.apache.nutch.parse.ParseFilter)
11/08/01 15:17:49 INFO plugin.PluginRepository:         Nutch URL
Filter (org.apache.nutch.net.URLFilter)
11/08/01 15:17:49 INFO plugin.PluginRepository:         Nutch Indexing
Filter (org.apache.nutch.indexer.IndexingFilter)
11/08/01 15:17:49 INFO plugin.PluginRepository:         Nutch Content
Parser (org.apache.nutch.parse.Parser)
11/08/01 15:17:49 INFO plugin.PluginRepository:         Nutch Scoring
(org.apache.nutch.scoring.ScoringFilter)
11/08/01 15:17:50 INFO conf.Configuration: found resource
regex-normalize.xml at
file:/tmp/hadoop-alex/hadoop-unjar8045717865743865180/regex-normalize.xml
11/08/01 15:17:50 INFO conf.Configuration: found resource
regex-urlfilter.txt at
file:/tmp/hadoop-alex/hadoop-unjar8045717865743865180/regex-urlfilter.txt
11/08/01 15:17:50 INFO regex.RegexURLNormalizer: can't find rules for
scope 'inject', using default
11/08/01 15:17:50 INFO mapred.JobClient:  map 0% reduce 0%
11/08/01 15:17:51 INFO mapred.TaskRunner:
Task:attempt_local_0001_m_000000_0 is done. And is in the process of
commiting
11/08/01 15:17:51 INFO mapred.LocalJobRunner:
11/08/01 15:17:51 INFO mapred.TaskRunner: Task
'attempt_local_0001_m_000000_0' done.
11/08/01 15:17:52 INFO mapred.JobClient:  map 100% reduce 0%
11/08/01 15:17:52 INFO mapred.JobClient: Job complete: job_local_0001
11/08/01 15:17:52 INFO mapred.JobClient: Counters: 5
11/08/01 15:17:52 INFO mapred.JobClient:   FileSystemCounters
11/08/01 15:17:52 INFO mapred.JobClient:     FILE_BYTES_READ=44872735
11/08/01 15:17:52 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=45245279
11/08/01 15:17:52 INFO mapred.JobClient:   Map-Reduce Framework
11/08/01 15:17:52 INFO mapred.JobClient:     Map input records=3
11/08/01 15:17:52 INFO mapred.JobClient:     Spilled Records=0
11/08/01 15:17:52 INFO mapred.JobClient:     Map output records=3
11/08/01 15:17:52 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
11/08/01 15:17:52 INFO crawl.InjectorJob: InjectorJob: finished



This is what was added to ivy/ivy.xml:

+       <dependency org="org.apache.gora" name="gora-cassandra"
rev="0.2-incubating" conf="*->compile"/>
+       <dependency org="org.apache.cassandra" name="cassandra-thrift"
rev="0.8.1"/>
+       <dependency org="com.ecyrd.speed4j" name="speed4j" rev="0.9"
conf="*->*,!javadoc,!sources"/>
+       <dependency org="com.github.stephenc.high-scale-lib"
name="high-scale-lib" rev="1.1.2" conf="*->*,!javadoc,!sources"/>
+       <dependency org="com.google.collections"
name="google-collections" rev="1.0" conf="*->*,!javadoc,!sources"/>
+       <dependency org="com.google.guava" name="guava" rev="r09"
conf="*->*,!javadoc,!sources"/>
+       <dependency org="org.apache.cassandra" name="apache-cassandra"
rev="0.8.1"/>
+       <dependency org="me.prettyprint" name="hector-core" rev="0.8.0-2"/>



On Mon, Aug 1, 2011 at 2:55 PM, Tom Davidson <tdavid...@covario.com> wrote:
> I did something similar to below to add the Cassandra dependencies. Note that 
> I am getting NoSuchMethodErrors not ClassNotFoundExceptions. Can you add the 
> hector jars to your nutch job jar and see what you get? I think I am one step 
> ahead of you. BTW, I just added this line to get the hector dependency:
>
>        <dependency org="me.prettyprint" name="hector-core" rev="0.8.0-2" 
> conf="*->default"/>
>
> -----Original Message-----
> From: Alexis [mailto:alexis.detregl...@gmail.com]
> Sent: Monday, August 01, 2011 2:28 PM
> To: dev@nutch.apache.org
> Subject: Re: Nutch 2 and Cassandra
>
> Hi, libthrift is a dependency of cassandra-thrift, as listed here:
> http://mvnrepository.com/artifact/org.apache.cassandra/cassandra-thrift/0.8.1
>
> During Nutch build, you have to manually tweak the Ivy configuration 
> depending on your choice of the Gora store, in this case Cassandra.
> Basically you need to add all the dependencies listed there:
> http://svn.apache.org/viewvc/incubator/gora/trunk/gora-cassandra/ivy/ivy.xml?view=markup
>
> Let's try to add to $NUTCH_HOME/ivy/ivy.xml the following dependencies and 
> then let's rebuild Nutch (see attached patch):
>        <dependency org="org.apache.gora" name="gora-cassandra"
> rev="0.2-incubating" conf="*->compile"/>
>        <dependency org="org.apache.cassandra" name="cassandra-thrift" 
> rev="0.8.1"/>
>        <dependency org="com.ecyrd.speed4j" name="speed4j" rev="0.9"
> conf="*->*,!javadoc,!sources"/>
>        <dependency org="com.github.stephenc.high-scale-lib"
> name="high-scale-lib" rev="1.1.2" conf="*->*,!javadoc,!sources"/>
>        <dependency org="com.google.collections" name="google-collections"
> rev="1.0" conf="*->*,!javadoc,!sources"/>
>        <dependency org="com.google.guava" name="guava" rev="r09"
> conf="*->*,!javadoc,!sources"/>
>
> $ ant clean
> $ ant
>
> In your case libthrift should now be downloaded by Ivy and then bundled into 
> the nutch-2.0-dev.job file. I'm not sure how apache-cassandra and hector got 
> included in your classpath...
>
> Somehow we need to resolve as well:
>        <dependency org="org.apache.cassandra" name="apache-cassandra"
> rev="0.8.1"/>
>        <dependency org="me.prettyprint" name="hector" rev="0.8.0-1"/>
>
> I don't think the following 2 jars are in the default maven repository so 
> they won't be downloaded, that's why they were commented in the Gora 
> Cassandra Ivy config (gora/trunk/gora-cassandra/ivy/ivy.xml)
>
>
> Since hector jar is not found in my case I get:
> ~/java/workspace/Nutch/trunk/runtime/deploy$ bin/nutch inject 
> ~/java/workspace/Nutch/seeds
> 11/08/01 14:18:42 INFO crawl.InjectorJob: InjectorJob: starting
> 11/08/01 14:18:42 INFO crawl.InjectorJob: InjectorJob: urlDir:
> /home/alex/java/workspace/Nutch/seeds
> 11/08/01 14:18:42 INFO security.Groups: Group mapping 
> impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping;
> cacheTimeout=300000
> 11/08/01 14:18:42 INFO jvm.JvmMetrics: Initializing JVM Metrics with 
> processName=JobTracker, sessionId=
> 11/08/01 14:18:42 ERROR crawl.InjectorJob: InjectorJob:
> org.apache.gora.util.GoraException:
> java.lang.reflect.InvocationTargetException
>        at 
> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:110)
>        at 
> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:93)
>        at 
> org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:59)
>        at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:243)
>        at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:268)
>        at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:282)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
>        at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:292)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.util.RunJar.main(RunJar.java:192)
> Caused by: java.lang.reflect.InvocationTargetException
>        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>        at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>        at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>        at 
> org.apache.gora.util.ReflectionUtils.newInstance(ReflectionUtils.java:76)
>        at 
> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:102)
>        ... 12 more
> Caused by: java.lang.NoClassDefFoundError: 
> me/prettyprint/hector/api/Serializer
>        at 
> org.apache.gora.cassandra.store.CassandraStore.<init>(CassandraStore.java:60)
>        ... 18 more
> Caused by: java.lang.ClassNotFoundException:
> me.prettyprint.hector.api.Serializer
>        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>        ... 19 more
>
>
>
>
> On Mon, Aug 1, 2011 at 11:59 AM, Tom Davidson <tdavid...@covario.com> wrote:
>> Hi All,
>>
>>
>>
>> I am kind of at my wit's end here, so I am hoping someone here can
>> help.  I am trying to use Nutch2 and Cassandra and I have been
>> successful using the runtime/local build. I am using the Cloudera CDH3
>> on CentOs 5 and I do not want to contaminate by hadoop install by
>> dropping in a bunch of Nutch jars, etc. So I am trying to use the
>> nutch-2-dev.job jar. When I try to use the nutch2-dev.job jar, I get
>> the error below.  I have double and triple checked the classpath and
>> the included jars and the only jar that contains FieldValueMetaData is
>> the libthrift-0.6.1.jar which has the method that is claimed to be missing. 
>> Any ideas?
>>
>>
>>
>> Thanks,
>>
>> Tom
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> [tdavidson@nadevsan06 ~]$ bin/nutch inject urls
>>
>> /opt/jdk1.6.0_21/bin/java -Dproc_jar -Xmx1000m
>> -Dhadoop.log.dir=/usr/lib/hadoop-0.20/logs
>> -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/lib/hadoop-0.20
>> -Dhadoop.id.str=tdavidson -Dhadoop.root.logger=INFO,console
>> -Djava.library.path=/usr/lib/hadoop-0.20/lib/native/Linux-amd64-64
>> -Dhadoop.policy.file=hadoop-policy.xml -classpath
>> /usr/lib/hadoop-0.20/conf:/opt/jdk1.6.0_21/lib/tools.jar:/usr/lib/hado
>> op-0.20:/usr/lib/hadoop-0.20/hadoop-core-0.20.2-cdh3u1.jar:/usr/lib/ha
>> doop-0.20/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20/lib/aspectjrt
>> -1.6.5.jar:/usr/lib/hadoop-0.20/lib/aspectjtools-1.6.5.jar:/usr/lib/ha
>> doop-0.20/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20/lib/commons-cod
>> ec-1.4.jar:/usr/lib/hadoop-0.20/lib/commons-daemon-1.0.1.jar:/usr/lib/
>> hadoop-0.20/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20/lib/commons-ht
>> tpclient-3.0.1.jar:/usr/lib/hadoop-0.20/lib/commons-logging-1.0.4.jar:
>> /usr/lib/hadoop-0.20/lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop
>> -0.20/lib/commons-net-1.4.1.jar:/usr/lib/hadoop-0.20/lib/core-3.1.1.ja
>> r:/usr/lib/hadoop-0.20/lib/hadoop-fairscheduler-0.20.2-cdh3u1.jar:/usr
>> /lib/hadoop-0.20/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20/lib/hue-
>> plugins-1.2.0-cdh3u1.jar:/usr/lib/hadoop-0.20/lib/jackson-core-asl-1.5
>> .2.jar:/usr/lib/hadoop-0.20/lib/jackson-mapper-asl-1.5.2.jar:/usr/lib/
>> hadoop-0.20/lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop-0.20/lib/ja
>> sper-runtime-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jets3t-0.6.1.jar:/usr
>> /lib/hadoop-0.20/lib/jetty-6.1.26.jar:/usr/lib/hadoop-0.20/lib/jetty-s
>> ervlet-tester-6.1.26.jar:/usr/lib/hadoop-0.20/lib/jetty-util-6.1.26.ja
>> r:/usr/lib/hadoop-0.20/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20/lib/ju
>> nit-4.5.jar:/usr/lib/hadoop-0.20/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.2
>> 0/lib/log4j-1.2.15.jar:/usr/lib/hadoop-0.20/lib/mockito-all-1.8.2.jar:
>> /usr/lib/hadoop-0.20/lib/oro-2.0.8.jar:/usr/lib/hadoop-0.20/lib/servle
>> t-api-2.5-20081211.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-6.1.14
>> .jar:/usr/lib/hadoop-0.20/lib/slf4j-api-1.4.3.jar:/usr/lib/hadoop-0.20
>> /lib/slf4j-log4j12-1.4.3.jar:/usr/lib/hadoop-0.20/lib/xmlenc-0.52.jar:
>> /usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-2.1.jar:/usr/lib/hadoop-0.20/lib/
>> jsp-2.1/jsp-api-2.1.jar org.apache.hadoop.util.RunJar
>> /home/SEMDIRECTOR/tdavidson/nutch-2.job
>> org.apache.nutch.crawl.InjectorJob urls
>>
>> 11/08/01 11:51:54 INFO crawl.InjectorJob: InjectorJob: starting
>>
>> 11/08/01 11:51:54 INFO crawl.InjectorJob: InjectorJob: urlDir: urls
>>
>> 11/08/01 11:51:55 INFO connection.CassandraHostRetryService: Downed
>> Host Retry service started with queue size -1 and retry delay 10s
>>
>> 11/08/01 11:51:55 INFO service.JmxMonitor: Registering JMX
>> me.prettyprint.cassandra.service_Test
>> Cluster:ServiceType=hector,MonitorType=hector
>>
>> 11/08/01 11:51:55 ERROR crawl.InjectorJob: InjectorJob:
>> org.apache.gora.util.GoraException:
>> java.lang.reflect.InvocationTargetException
>>
>>         at
>> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactor
>> y.java:110)
>>
>>         at
>> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactor
>> y.java:93)
>>
>>         at
>> org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java
>> :59)
>>
>>         at
>> org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:243)
>>
>>         at
>> org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:268)
>>
>>         at
>> org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:282)
>>
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>
>>         at
>> org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:292)
>>
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j
>> ava:39)
>>
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess
>> orImpl.java:25)
>>
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>
>>         at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
>>
>> Caused by: java.lang.reflect.InvocationTargetException
>>
>>         at
>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>> Method)
>>
>>         at
>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructo
>> rAccessorImpl.java:39)
>>
>>         at
>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingCo
>> nstructorAccessorImpl.java:27)
>>
>>         at
>> java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>>
>>         at
>> org.apache.gora.util.ReflectionUtils.newInstance(ReflectionUtils.java:
>> 76)
>>
>>         at
>> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactor
>> y.java:102)
>>
>>         ... 12 more
>>
>> Caused by: java.lang.NoSuchMethodError:
>> org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V
>>
>>         at org.apache.cassandra.thrift.CfDef.<clinit>(CfDef.java:299)
>>
>>         at org.apache.cassandra.thrift.KsDef.read(KsDef.java:753)
>>
>>         at
>> org.apache.cassandra.thrift.Cassandra$describe_keyspace_result.read(Ca
>> ssandra.java:24338)
>>
>>         at
>> org.apache.cassandra.thrift.Cassandra$Client.recv_describe_keyspace(Ca
>> ssandra.java:1371)
>>
>>         at
>> org.apache.cassandra.thrift.Cassandra$Client.describe_keyspace(Cassand
>> ra.java:1346)
>>
>>         at
>> me.prettyprint.cassandra.service.AbstractCluster$4.execute(AbstractClu
>> ster.java:192)
>>
>>         at
>> me.prettyprint.cassandra.service.AbstractCluster$4.execute(AbstractClu
>> ster.java:187)
>>
>>         at
>> me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operati
>> on.java:101)
>>
>>         at
>> me.prettyprint.cassandra.connection.HConnectionManager.operateWithFail
>> over(HConnectionManager.java:232)
>>
>>         at
>> me.prettyprint.cassandra.service.AbstractCluster.describeKeyspace(Abst
>> ractCluster.java:201)
>>
>>         at
>> org.apache.gora.cassandra.store.CassandraClient.checkKeyspace(Cassandr
>> aClient.java:82)
>>
>>         at
>> org.apache.gora.cassandra.store.CassandraClient.init(CassandraClient.j
>> ava:69)
>>
>>         at
>> org.apache.gora.cassandra.store.CassandraStore.<init>(CassandraStore.j
>> ava:68)
>>
>>         ... 18 more
>

Reply via email to