Ok this version of hector was properly resolved. Thanks! These are the logs: ~/java/workspace/Nutch/trunk/runtime/deploy$ bin/nutch inject ~/java/workspace/Nutch/seeds 11/08/01 15:17:45 INFO crawl.InjectorJob: InjectorJob: starting 11/08/01 15:17:45 INFO crawl.InjectorJob: InjectorJob: urlDir: /home/alex/java/workspace/Nutch/seeds 11/08/01 15:17:45 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 11/08/01 15:17:46 INFO connection.CassandraHostRetryService: Downed Host Retry service started with queue size -1 and retry delay 10s 11/08/01 15:17:46 INFO service.JmxMonitor: Registering JMX me.prettyprint.cassandra.service_Test Cluster:ServiceType=hector,MonitorType=hector 11/08/01 15:17:47 INFO store.CassandraClient: Keyspace 'webpage' in cluster 'Test Cluster' was created on host 'localhost' 11/08/01 15:17:48 INFO input.FileInputFormat: Total input paths to process : 1 11/08/01 15:17:49 INFO mapred.JobClient: Running job: job_local_0001 11/08/01 15:17:49 INFO input.FileInputFormat: Total input paths to process : 1 11/08/01 15:17:49 INFO mapreduce.GoraRecordWriter: gora.buffer.write.limit = 10000 11/08/01 15:17:49 INFO plugin.PluginRepository: Plugins: looking in: /tmp/hadoop-alex/hadoop-unjar8045717865743865180/plugins 11/08/01 15:17:49 INFO plugin.PluginRepository: Plugin Auto-activation mode: [true] 11/08/01 15:17:49 INFO plugin.PluginRepository: Registered Plugins: 11/08/01 15:17:49 INFO plugin.PluginRepository: the nutch core extension points (nutch-extensionpoints) 11/08/01 15:17:49 INFO plugin.PluginRepository: Basic URL Normalizer (urlnormalizer-basic) 11/08/01 15:17:49 INFO plugin.PluginRepository: Basic Indexing Filter (index-basic) 11/08/01 15:17:49 INFO plugin.PluginRepository: Html Parse Plug-in (parse-html) 11/08/01 15:17:49 INFO plugin.PluginRepository: HTTP Framework (lib-http) 11/08/01 15:17:49 INFO plugin.PluginRepository: Pass-through URL Normalizer (urlnormalizer-pass) 11/08/01 15:17:49 INFO plugin.PluginRepository: Regex URL Filter (urlfilter-regex) 11/08/01 15:17:49 INFO plugin.PluginRepository: Http Protocol Plug-in (protocol-http) 11/08/01 15:17:49 INFO plugin.PluginRepository: Regex URL Normalizer (urlnormalizer-regex) 11/08/01 15:17:49 INFO plugin.PluginRepository: Tika Parser Plug-in (parse-tika) 11/08/01 15:17:49 INFO plugin.PluginRepository: OPIC Scoring Plug-in (scoring-opic) 11/08/01 15:17:49 INFO plugin.PluginRepository: CyberNeko HTML Parser (lib-nekohtml) 11/08/01 15:17:49 INFO plugin.PluginRepository: Anchor Indexing Filter (index-anchor) 11/08/01 15:17:49 INFO plugin.PluginRepository: Regex URL Filter Framework (lib-regex-filter) 11/08/01 15:17:49 INFO plugin.PluginRepository: Registered Extension-Points: 11/08/01 15:17:49 INFO plugin.PluginRepository: Nutch URL Normalizer (org.apache.nutch.net.URLNormalizer) 11/08/01 15:17:49 INFO plugin.PluginRepository: Nutch Protocol (org.apache.nutch.protocol.Protocol) 11/08/01 15:17:49 INFO plugin.PluginRepository: Parse Filter (org.apache.nutch.parse.ParseFilter) 11/08/01 15:17:49 INFO plugin.PluginRepository: Nutch URL Filter (org.apache.nutch.net.URLFilter) 11/08/01 15:17:49 INFO plugin.PluginRepository: Nutch Indexing Filter (org.apache.nutch.indexer.IndexingFilter) 11/08/01 15:17:49 INFO plugin.PluginRepository: Nutch Content Parser (org.apache.nutch.parse.Parser) 11/08/01 15:17:49 INFO plugin.PluginRepository: Nutch Scoring (org.apache.nutch.scoring.ScoringFilter) 11/08/01 15:17:50 INFO conf.Configuration: found resource regex-normalize.xml at file:/tmp/hadoop-alex/hadoop-unjar8045717865743865180/regex-normalize.xml 11/08/01 15:17:50 INFO conf.Configuration: found resource regex-urlfilter.txt at file:/tmp/hadoop-alex/hadoop-unjar8045717865743865180/regex-urlfilter.txt 11/08/01 15:17:50 INFO regex.RegexURLNormalizer: can't find rules for scope 'inject', using default 11/08/01 15:17:50 INFO mapred.JobClient: map 0% reduce 0% 11/08/01 15:17:51 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting 11/08/01 15:17:51 INFO mapred.LocalJobRunner: 11/08/01 15:17:51 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done. 11/08/01 15:17:52 INFO mapred.JobClient: map 100% reduce 0% 11/08/01 15:17:52 INFO mapred.JobClient: Job complete: job_local_0001 11/08/01 15:17:52 INFO mapred.JobClient: Counters: 5 11/08/01 15:17:52 INFO mapred.JobClient: FileSystemCounters 11/08/01 15:17:52 INFO mapred.JobClient: FILE_BYTES_READ=44872735 11/08/01 15:17:52 INFO mapred.JobClient: FILE_BYTES_WRITTEN=45245279 11/08/01 15:17:52 INFO mapred.JobClient: Map-Reduce Framework 11/08/01 15:17:52 INFO mapred.JobClient: Map input records=3 11/08/01 15:17:52 INFO mapred.JobClient: Spilled Records=0 11/08/01 15:17:52 INFO mapred.JobClient: Map output records=3 11/08/01 15:17:52 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 11/08/01 15:17:52 INFO crawl.InjectorJob: InjectorJob: finished
This is what was added to ivy/ivy.xml: + <dependency org="org.apache.gora" name="gora-cassandra" rev="0.2-incubating" conf="*->compile"/> + <dependency org="org.apache.cassandra" name="cassandra-thrift" rev="0.8.1"/> + <dependency org="com.ecyrd.speed4j" name="speed4j" rev="0.9" conf="*->*,!javadoc,!sources"/> + <dependency org="com.github.stephenc.high-scale-lib" name="high-scale-lib" rev="1.1.2" conf="*->*,!javadoc,!sources"/> + <dependency org="com.google.collections" name="google-collections" rev="1.0" conf="*->*,!javadoc,!sources"/> + <dependency org="com.google.guava" name="guava" rev="r09" conf="*->*,!javadoc,!sources"/> + <dependency org="org.apache.cassandra" name="apache-cassandra" rev="0.8.1"/> + <dependency org="me.prettyprint" name="hector-core" rev="0.8.0-2"/> On Mon, Aug 1, 2011 at 2:55 PM, Tom Davidson <tdavid...@covario.com> wrote: > I did something similar to below to add the Cassandra dependencies. Note that > I am getting NoSuchMethodErrors not ClassNotFoundExceptions. Can you add the > hector jars to your nutch job jar and see what you get? I think I am one step > ahead of you. BTW, I just added this line to get the hector dependency: > > <dependency org="me.prettyprint" name="hector-core" rev="0.8.0-2" > conf="*->default"/> > > -----Original Message----- > From: Alexis [mailto:alexis.detregl...@gmail.com] > Sent: Monday, August 01, 2011 2:28 PM > To: dev@nutch.apache.org > Subject: Re: Nutch 2 and Cassandra > > Hi, libthrift is a dependency of cassandra-thrift, as listed here: > http://mvnrepository.com/artifact/org.apache.cassandra/cassandra-thrift/0.8.1 > > During Nutch build, you have to manually tweak the Ivy configuration > depending on your choice of the Gora store, in this case Cassandra. > Basically you need to add all the dependencies listed there: > http://svn.apache.org/viewvc/incubator/gora/trunk/gora-cassandra/ivy/ivy.xml?view=markup > > Let's try to add to $NUTCH_HOME/ivy/ivy.xml the following dependencies and > then let's rebuild Nutch (see attached patch): > <dependency org="org.apache.gora" name="gora-cassandra" > rev="0.2-incubating" conf="*->compile"/> > <dependency org="org.apache.cassandra" name="cassandra-thrift" > rev="0.8.1"/> > <dependency org="com.ecyrd.speed4j" name="speed4j" rev="0.9" > conf="*->*,!javadoc,!sources"/> > <dependency org="com.github.stephenc.high-scale-lib" > name="high-scale-lib" rev="1.1.2" conf="*->*,!javadoc,!sources"/> > <dependency org="com.google.collections" name="google-collections" > rev="1.0" conf="*->*,!javadoc,!sources"/> > <dependency org="com.google.guava" name="guava" rev="r09" > conf="*->*,!javadoc,!sources"/> > > $ ant clean > $ ant > > In your case libthrift should now be downloaded by Ivy and then bundled into > the nutch-2.0-dev.job file. I'm not sure how apache-cassandra and hector got > included in your classpath... > > Somehow we need to resolve as well: > <dependency org="org.apache.cassandra" name="apache-cassandra" > rev="0.8.1"/> > <dependency org="me.prettyprint" name="hector" rev="0.8.0-1"/> > > I don't think the following 2 jars are in the default maven repository so > they won't be downloaded, that's why they were commented in the Gora > Cassandra Ivy config (gora/trunk/gora-cassandra/ivy/ivy.xml) > > > Since hector jar is not found in my case I get: > ~/java/workspace/Nutch/trunk/runtime/deploy$ bin/nutch inject > ~/java/workspace/Nutch/seeds > 11/08/01 14:18:42 INFO crawl.InjectorJob: InjectorJob: starting > 11/08/01 14:18:42 INFO crawl.InjectorJob: InjectorJob: urlDir: > /home/alex/java/workspace/Nutch/seeds > 11/08/01 14:18:42 INFO security.Groups: Group mapping > impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; > cacheTimeout=300000 > 11/08/01 14:18:42 INFO jvm.JvmMetrics: Initializing JVM Metrics with > processName=JobTracker, sessionId= > 11/08/01 14:18:42 ERROR crawl.InjectorJob: InjectorJob: > org.apache.gora.util.GoraException: > java.lang.reflect.InvocationTargetException > at > org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:110) > at > org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:93) > at > org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:59) > at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:243) > at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:268) > at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:282) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69) > at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:292) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:192) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > at > org.apache.gora.util.ReflectionUtils.newInstance(ReflectionUtils.java:76) > at > org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:102) > ... 12 more > Caused by: java.lang.NoClassDefFoundError: > me/prettyprint/hector/api/Serializer > at > org.apache.gora.cassandra.store.CassandraStore.<init>(CassandraStore.java:60) > ... 18 more > Caused by: java.lang.ClassNotFoundException: > me.prettyprint.hector.api.Serializer > at java.net.URLClassLoader$1.run(URLClassLoader.java:202) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:190) > at java.lang.ClassLoader.loadClass(ClassLoader.java:306) > at java.lang.ClassLoader.loadClass(ClassLoader.java:247) > ... 19 more > > > > > On Mon, Aug 1, 2011 at 11:59 AM, Tom Davidson <tdavid...@covario.com> wrote: >> Hi All, >> >> >> >> I am kind of at my wit's end here, so I am hoping someone here can >> help. I am trying to use Nutch2 and Cassandra and I have been >> successful using the runtime/local build. I am using the Cloudera CDH3 >> on CentOs 5 and I do not want to contaminate by hadoop install by >> dropping in a bunch of Nutch jars, etc. So I am trying to use the >> nutch-2-dev.job jar. When I try to use the nutch2-dev.job jar, I get >> the error below. I have double and triple checked the classpath and >> the included jars and the only jar that contains FieldValueMetaData is >> the libthrift-0.6.1.jar which has the method that is claimed to be missing. >> Any ideas? >> >> >> >> Thanks, >> >> Tom >> >> >> >> >> >> >> >> >> >> [tdavidson@nadevsan06 ~]$ bin/nutch inject urls >> >> /opt/jdk1.6.0_21/bin/java -Dproc_jar -Xmx1000m >> -Dhadoop.log.dir=/usr/lib/hadoop-0.20/logs >> -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/lib/hadoop-0.20 >> -Dhadoop.id.str=tdavidson -Dhadoop.root.logger=INFO,console >> -Djava.library.path=/usr/lib/hadoop-0.20/lib/native/Linux-amd64-64 >> -Dhadoop.policy.file=hadoop-policy.xml -classpath >> /usr/lib/hadoop-0.20/conf:/opt/jdk1.6.0_21/lib/tools.jar:/usr/lib/hado >> op-0.20:/usr/lib/hadoop-0.20/hadoop-core-0.20.2-cdh3u1.jar:/usr/lib/ha >> doop-0.20/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20/lib/aspectjrt >> -1.6.5.jar:/usr/lib/hadoop-0.20/lib/aspectjtools-1.6.5.jar:/usr/lib/ha >> doop-0.20/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20/lib/commons-cod >> ec-1.4.jar:/usr/lib/hadoop-0.20/lib/commons-daemon-1.0.1.jar:/usr/lib/ >> hadoop-0.20/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20/lib/commons-ht >> tpclient-3.0.1.jar:/usr/lib/hadoop-0.20/lib/commons-logging-1.0.4.jar: >> /usr/lib/hadoop-0.20/lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop >> -0.20/lib/commons-net-1.4.1.jar:/usr/lib/hadoop-0.20/lib/core-3.1.1.ja >> r:/usr/lib/hadoop-0.20/lib/hadoop-fairscheduler-0.20.2-cdh3u1.jar:/usr >> /lib/hadoop-0.20/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20/lib/hue- >> plugins-1.2.0-cdh3u1.jar:/usr/lib/hadoop-0.20/lib/jackson-core-asl-1.5 >> .2.jar:/usr/lib/hadoop-0.20/lib/jackson-mapper-asl-1.5.2.jar:/usr/lib/ >> hadoop-0.20/lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop-0.20/lib/ja >> sper-runtime-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jets3t-0.6.1.jar:/usr >> /lib/hadoop-0.20/lib/jetty-6.1.26.jar:/usr/lib/hadoop-0.20/lib/jetty-s >> ervlet-tester-6.1.26.jar:/usr/lib/hadoop-0.20/lib/jetty-util-6.1.26.ja >> r:/usr/lib/hadoop-0.20/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20/lib/ju >> nit-4.5.jar:/usr/lib/hadoop-0.20/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.2 >> 0/lib/log4j-1.2.15.jar:/usr/lib/hadoop-0.20/lib/mockito-all-1.8.2.jar: >> /usr/lib/hadoop-0.20/lib/oro-2.0.8.jar:/usr/lib/hadoop-0.20/lib/servle >> t-api-2.5-20081211.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-6.1.14 >> .jar:/usr/lib/hadoop-0.20/lib/slf4j-api-1.4.3.jar:/usr/lib/hadoop-0.20 >> /lib/slf4j-log4j12-1.4.3.jar:/usr/lib/hadoop-0.20/lib/xmlenc-0.52.jar: >> /usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-2.1.jar:/usr/lib/hadoop-0.20/lib/ >> jsp-2.1/jsp-api-2.1.jar org.apache.hadoop.util.RunJar >> /home/SEMDIRECTOR/tdavidson/nutch-2.job >> org.apache.nutch.crawl.InjectorJob urls >> >> 11/08/01 11:51:54 INFO crawl.InjectorJob: InjectorJob: starting >> >> 11/08/01 11:51:54 INFO crawl.InjectorJob: InjectorJob: urlDir: urls >> >> 11/08/01 11:51:55 INFO connection.CassandraHostRetryService: Downed >> Host Retry service started with queue size -1 and retry delay 10s >> >> 11/08/01 11:51:55 INFO service.JmxMonitor: Registering JMX >> me.prettyprint.cassandra.service_Test >> Cluster:ServiceType=hector,MonitorType=hector >> >> 11/08/01 11:51:55 ERROR crawl.InjectorJob: InjectorJob: >> org.apache.gora.util.GoraException: >> java.lang.reflect.InvocationTargetException >> >> at >> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactor >> y.java:110) >> >> at >> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactor >> y.java:93) >> >> at >> org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java >> :59) >> >> at >> org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:243) >> >> at >> org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:268) >> >> at >> org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:282) >> >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >> >> at >> org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:292) >> >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j >> ava:39) >> >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess >> orImpl.java:25) >> >> at java.lang.reflect.Method.invoke(Method.java:597) >> >> at org.apache.hadoop.util.RunJar.main(RunJar.java:186) >> >> Caused by: java.lang.reflect.InvocationTargetException >> >> at >> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native >> Method) >> >> at >> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructo >> rAccessorImpl.java:39) >> >> at >> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingCo >> nstructorAccessorImpl.java:27) >> >> at >> java.lang.reflect.Constructor.newInstance(Constructor.java:513) >> >> at >> org.apache.gora.util.ReflectionUtils.newInstance(ReflectionUtils.java: >> 76) >> >> at >> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactor >> y.java:102) >> >> ... 12 more >> >> Caused by: java.lang.NoSuchMethodError: >> org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V >> >> at org.apache.cassandra.thrift.CfDef.<clinit>(CfDef.java:299) >> >> at org.apache.cassandra.thrift.KsDef.read(KsDef.java:753) >> >> at >> org.apache.cassandra.thrift.Cassandra$describe_keyspace_result.read(Ca >> ssandra.java:24338) >> >> at >> org.apache.cassandra.thrift.Cassandra$Client.recv_describe_keyspace(Ca >> ssandra.java:1371) >> >> at >> org.apache.cassandra.thrift.Cassandra$Client.describe_keyspace(Cassand >> ra.java:1346) >> >> at >> me.prettyprint.cassandra.service.AbstractCluster$4.execute(AbstractClu >> ster.java:192) >> >> at >> me.prettyprint.cassandra.service.AbstractCluster$4.execute(AbstractClu >> ster.java:187) >> >> at >> me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operati >> on.java:101) >> >> at >> me.prettyprint.cassandra.connection.HConnectionManager.operateWithFail >> over(HConnectionManager.java:232) >> >> at >> me.prettyprint.cassandra.service.AbstractCluster.describeKeyspace(Abst >> ractCluster.java:201) >> >> at >> org.apache.gora.cassandra.store.CassandraClient.checkKeyspace(Cassandr >> aClient.java:82) >> >> at >> org.apache.gora.cassandra.store.CassandraClient.init(CassandraClient.j >> ava:69) >> >> at >> org.apache.gora.cassandra.store.CassandraStore.<init>(CassandraStore.j >> ava:68) >> >> ... 18 more >