Hi Sebastian, Here is the complete log trace from the haddop.log file
2016-09-25 19:14:08,455 INFO fetcher.FetchItemQueues - Using queue mode : byHost 2016-09-25 19:14:08,455 INFO fetcher.Fetcher - Fetcher: threads: 50 2016-09-25 19:14:08,455 INFO fetcher.Fetcher - Fetcher: time-out divisor: 2 2016-09-25 19:14:08,459 INFO fetcher.QueueFeeder - QueueFeeder finished: total 3 records + hit by time limit :0 2016-09-25 19:14:08,559 INFO net.URLExemptionFilters - Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter' 2016-09-25 19:14:08,570 INFO fetcher.FetcherThreadPublisher - Setting up publishers 2016-09-25 19:14:08,587 WARN mapred.LocalJobRunner - job_local1447446310_0001 java.lang.Exception: java.lang.ExceptionInInitializerError at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) Caused by: java.lang.ExceptionInInitializerError at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:188) at org.apache.nutch.publisher.kafka.KafkaPublisherImpl.setConfig(KafkaPublisherImpl.java:70) at org.apache.nutch.publisher.NutchPublishers.setConfig(NutchPublishers.java:44) at org.apache.nutch.fetcher.FetcherThreadPublisher.<init>(FetcherThreadPublisher.java:40) at org.apache.nutch.fetcher.FetcherThread.<init>(FetcherThread.java:174) at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:213) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.kafka.common.config.ConfigException: Invalid value org.apache.kafka.clients.producer.internals.DefaultPartitioner for configuration partitioner.class: Class org.apache.kafka.clients.producer.internals.DefaultPartitioner could not be found. at org.apache.kafka.common.config.ConfigDef.parseType(ConfigDef.java:672) at org.apache.kafka.common.config.ConfigDef.define(ConfigDef.java:110) at org.apache.kafka.common.config.ConfigDef.define(ConfigDef.java:132) at org.apache.kafka.common.config.ConfigDef.define(ConfigDef.java:171) at org.apache.kafka.common.config.ConfigDef.define(ConfigDef.java:333) at org.apache.kafka.common.config.ConfigDef.define(ConfigDef.java:346) at org.apache.kafka.clients.producer.ProducerConfig.<clinit>(ProducerConfig.java:222) ... 14 more 2016-09-25 19:14:09,346 ERROR fetcher.Fetcher - Fetcher: java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:865) at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:484) at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:519) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:493) Thanks for your help and comments on https://github.com/apache/ nutch/pull/152. On Sun, Sep 25, 2016 at 2:54 AM, Sebastian Nagel <wastl.na...@googlemail.com > wrote: > Hi Sujen, > > could you send the complete stack trace? Just to be sure from where the > error stems. > > > I looked at the code here https://github.com/apache/ > nutch/blob/master/src/bin/nutch#L155-L164 > > <https://github.com/apache/nutch/blob/master/src/bin/nutch#L155-L164> > and cannot understand the use > > of lines 161-163, if the plugins folder is found add the home directory > to the classpath ? > > In a local installation $NUTCH_HOME ("runtime/local") is added to the > classpath because the folder > "plugins" defined in the property "plugin.folders" is located here > ("runtime/local/plugins"), see: > > <property> > <name>plugin.folders</name> > <value>plugins</value> > <description>Directories where nutch plugins are located. Each > element may be a relative or absolute path. If absolute, it is used > as is. If relative, it is searched for on the classpath.</description> > </property> > > See also my comments on https://github.com/apache/nutch/pull/152 > > Sebastian > > > On 09/23/2016 12:06 AM, Sujen Shah wrote: > > Thank you Sebastian for your response. > > > > I followed the steps as per your suggestion and added the required jars > under runtime in plugin.xml. > > My code is at - https://github.com/sujen1412/ > nutch/blob/kafka/src/plugin/publish-kafka/plugin.xml > > <https://github.com/sujen1412/nutch/blob/kafka/src/plugin/ > publish-kafka/plugin.xml>. > > > > Now after compiling and running ./bin/crawl in local mode, the fetch job > fails due to > > > > Caused by: org.apache.kafka.common.config.ConfigException: Invalid value > > org.apache.kafka.clients.producer.internals.DefaultPartitioner for > configuration partitioner.class: > > Class org.apache.kafka.clients.producer.internals.DefaultPartitioner > could not be found. > > > > Am I missing something ? > > > > To find out the cause for this, I copied the jars from the > runtime/local/plugin/<some-plugin>/*.jar > > to the runtime/local/lib directory, the code seems to work perfectly > fine, which may imply that the > > jars listed under the runtime tag in plugin.xml are not getting added to > classpath during runtime. > > > > I looked at the code here https://github.com/apache/ > nutch/blob/master/src/bin/nutch#L155-L164 > > <https://github.com/apache/nutch/blob/master/src/bin/nutch#L155-L164> > and cannot understand the use > > of lines 161-163, if the plugins folder is found add the home directory > to the classpath ? > > Looking into to various ways to set a classpath > > (https://docs.oracle.com/javase/8/docs/technotes/tools/ > windows/classpath.html#A1100762), it says > > that subdirectories are not searched recursively. > > > > Thanks once again for your help. > > > > > > On Wed, Sep 14, 2016 at 12:10 AM, Sebastian Nagel < > wastl.na...@googlemail.com > > <mailto:wastl.na...@googlemail.com>> wrote: > > > > Hi Sujen, > > > > are the jars also listed in the plugin.xml? > > > > That's required. The plugin-specific ivy.xml is only used at compile > time > > to fetch the library and its dependencies and get the plugin > compiled. > > > > At runtime all required libs have to be listed in the plugin.xml, > e.g., > > https://github.com/apache/nutch/blob/master/src/plugin/ > parse-tika/plugin.xml > > <https://github.com/apache/nutch/blob/master/src/plugin/ > parse-tika/plugin.xml> > > > > This double work is not ideal and a frequent cause for errors but > that's > > how it works right now. > > > > Cheers, > > Sebastian > > > > > > On 09/12/2016 11:56 PM, Sujen Shah wrote: > > > Hi Devs, > > > > > > I am facing issues in loading jars required for plugins while > running Nutch in local mode. > > > > > > I am doing the following : > > > 1. add a dependency in <some-plugin>/ivy.xml > > > 2. ant clean runtime > > > > > > Now, when I print the classpath before running, the /bin/nutch > script does not seem to be adding > > > those jars on to the classpath and throws runtime exceptions. To > mitigate this I added the > > > dependency in the root ivy.xml. > > > > > > I don't know if I am missing something here or anyone else has > faced the same issue and found a > > > solution. > > > For example - https://github.com/apache/ > nutch/tree/master/src/plugin/publish-rabbitmq > > <https://github.com/apache/nutch/tree/master/src/plugin/ > publish-rabbitmq>, the > > > dependency for amqp-client had to be added in the root ivy.xml as > well for it to not throw runtime > > > exceptions (ex - ClassNotFound) > > > > > > I have a created a patch which modifies the ./bin/nutch script to > load the plugin jars onto the > > > classpath which is attached below. This patch eliminates the need > to modify the root ivy.xml for > > > plugin specific dependencies. > > > > > > I wanted to ask the devs first if there was already a solution > before filing a JIRA issue. If not, > > > I'll submit it through JIRA. > > > > > > Thank you for your help. > > > > > > > > > Regards, > > > Sujen Shah > > > > > >