Re: Plugin dependancies do not get added to classpath while running Nutch in local mode

Sujen Shah Sun, 25 Sep 2016 19:17:50 -0700

Hi Sebastian,

Here is the complete log trace from the haddop.log file

2016-09-25 19:14:08,455 INFO  fetcher.FetchItemQueues - Using queue mode :
byHost
2016-09-25 19:14:08,455 INFO  fetcher.Fetcher - Fetcher: threads: 50
2016-09-25 19:14:08,455 INFO  fetcher.Fetcher - Fetcher: time-out divisor: 2
2016-09-25 19:14:08,459 INFO  fetcher.QueueFeeder - QueueFeeder finished:
total 3 records + hit by time limit :0
2016-09-25 19:14:08,559 INFO  net.URLExemptionFilters - Found 0 extensions
at point:'org.apache.nutch.net.URLExemptionFilter'
2016-09-25 19:14:08,570 INFO  fetcher.FetcherThreadPublisher - Setting up
publishers
2016-09-25 19:14:08,587 WARN  mapred.LocalJobRunner -
job_local1447446310_0001
java.lang.Exception: java.lang.ExceptionInInitializerError
at
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.ExceptionInInitializerError
at
org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:188)
at
org.apache.nutch.publisher.kafka.KafkaPublisherImpl.setConfig(KafkaPublisherImpl.java:70)
at
org.apache.nutch.publisher.NutchPublishers.setConfig(NutchPublishers.java:44)
at
org.apache.nutch.fetcher.FetcherThreadPublisher.<init>(FetcherThreadPublisher.java:40)
at org.apache.nutch.fetcher.FetcherThread.<init>(FetcherThread.java:174)
at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:213)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.kafka.common.config.ConfigException: Invalid value
org.apache.kafka.clients.producer.internals.DefaultPartitioner for
configuration partitioner.class: Class
org.apache.kafka.clients.producer.internals.DefaultPartitioner could not be
found.
at org.apache.kafka.common.config.ConfigDef.parseType(ConfigDef.java:672)
at org.apache.kafka.common.config.ConfigDef.define(ConfigDef.java:110)
at org.apache.kafka.common.config.ConfigDef.define(ConfigDef.java:132)
at org.apache.kafka.common.config.ConfigDef.define(ConfigDef.java:171)
at org.apache.kafka.common.config.ConfigDef.define(ConfigDef.java:333)
at org.apache.kafka.common.config.ConfigDef.define(ConfigDef.java:346)
at
org.apache.kafka.clients.producer.ProducerConfig.<clinit>(ProducerConfig.java:222)
... 14 more
2016-09-25 19:14:09,346 ERROR fetcher.Fetcher - Fetcher:
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:865)
at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:484)
at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:519)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:493)

Thanks for your help and comments on https://github.com/apache/
nutch/pull/152.

On Sun, Sep 25, 2016 at 2:54 AM, Sebastian Nagel <[email protected]
> wrote:

> Hi Sujen,
>
> could you send the complete stack trace? Just to be sure from where the
> error stems.
>
> > I looked at the code here https://github.com/apache/
> nutch/blob/master/src/bin/nutch#L155-L164
> > <https://github.com/apache/nutch/blob/master/src/bin/nutch#L155-L164>
> and cannot understand the use
> > of lines 161-163, if the plugins folder is found add the home directory
> to the classpath ?
>
> In a local installation $NUTCH_HOME ("runtime/local") is added to the
> classpath because the folder
> "plugins" defined in the property "plugin.folders" is located here
> ("runtime/local/plugins"), see:
>
> <property>
>   <name>plugin.folders</name>
>   <value>plugins</value>
>   <description>Directories where nutch plugins are located.  Each
>   element may be a relative or absolute path.  If absolute, it is used
>   as is.  If relative, it is searched for on the classpath.</description>
> </property>
>
> See also my comments on https://github.com/apache/nutch/pull/152
>
> Sebastian
>
>
> On 09/23/2016 12:06 AM, Sujen Shah wrote:
> > Thank you Sebastian for your response.
> >
> > I followed the steps as per your suggestion and added the required jars
> under runtime in plugin.xml.
> > My code is at - https://github.com/sujen1412/
> nutch/blob/kafka/src/plugin/publish-kafka/plugin.xml
> > <https://github.com/sujen1412/nutch/blob/kafka/src/plugin/
> publish-kafka/plugin.xml>.
> >
> > Now after compiling and running ./bin/crawl in local mode, the fetch job
> fails due to
> >
> > Caused by: org.apache.kafka.common.config.ConfigException: Invalid value
> > org.apache.kafka.clients.producer.internals.DefaultPartitioner for
> configuration partitioner.class:
> > Class org.apache.kafka.clients.producer.internals.DefaultPartitioner
> could not be found.
> >
> > Am I missing something ?
> >
> > To find out the cause for this, I copied the jars from the
> runtime/local/plugin/<some-plugin>/*.jar
> > to the runtime/local/lib directory, the code seems to work perfectly
> fine, which may imply that the
> > jars listed under the runtime tag in plugin.xml are not getting added to
> classpath during runtime.
> >
> > I looked at the code here https://github.com/apache/
> nutch/blob/master/src/bin/nutch#L155-L164
> > <https://github.com/apache/nutch/blob/master/src/bin/nutch#L155-L164>
> and cannot understand the use
> > of lines 161-163, if the plugins folder is found add the home directory
> to the classpath ?
> > Looking into to various ways to set a classpath
> > (https://docs.oracle.com/javase/8/docs/technotes/tools/
> windows/classpath.html#A1100762), it says
> > that subdirectories are not searched recursively.
> >
> > Thanks once again for your help.
> >
> >
> > On Wed, Sep 14, 2016 at 12:10 AM, Sebastian Nagel <
> [email protected]
> > <mailto:[email protected]>> wrote:
> >
> >     Hi Sujen,
> >
> >     are the jars also listed in the plugin.xml?
> >
> >     That's required. The plugin-specific ivy.xml is only used at compile
> time
> >     to fetch the library and its dependencies and get the plugin
> compiled.
> >
> >     At runtime all required libs have to be listed in the plugin.xml,
> e.g.,
> >     https://github.com/apache/nutch/blob/master/src/plugin/
> parse-tika/plugin.xml
> >     <https://github.com/apache/nutch/blob/master/src/plugin/
> parse-tika/plugin.xml>
> >
> >     This double work is not ideal and a frequent cause for errors but
> that's
> >     how it works right now.
> >
> >     Cheers,
> >     Sebastian
> >
> >
> >     On 09/12/2016 11:56 PM, Sujen Shah wrote:
> >     > Hi Devs,
> >     >
> >     > I am facing issues in loading jars required for plugins while
> running Nutch in local mode.
> >     >
> >     > I am doing the following :
> >     > 1. add a dependency in <some-plugin>/ivy.xml
> >     > 2. ant clean runtime
> >     >
> >     > Now, when I print the classpath before running, the /bin/nutch
> script does not seem to be adding
> >     > those jars on to the classpath and throws runtime exceptions. To
> mitigate this I added the
> >     > dependency in the root ivy.xml.
> >     >
> >     > I don't know if I am missing something here or anyone else has
> faced the same issue and found a
> >     > solution.
> >     > For example - https://github.com/apache/
> nutch/tree/master/src/plugin/publish-rabbitmq
> >     <https://github.com/apache/nutch/tree/master/src/plugin/
> publish-rabbitmq>, the
> >     > dependency for amqp-client had to be added in the root ivy.xml as
> well for it to not throw runtime
> >     > exceptions (ex - ClassNotFound)
> >     >
> >     > I have a created a patch which modifies the ./bin/nutch script to
> load the plugin jars onto the
> >     > classpath which is attached below. This patch eliminates the need
> to modify the root ivy.xml for
> >     > plugin specific dependencies.
> >     >
> >     > I wanted to ask the devs first if there was already a solution
> before filing a JIRA issue. If not,
> >     > I'll submit it through JIRA.
> >     >
> >     > Thank you for your help.
> >     >
> >     >
> >     > Regards,
> >     > Sujen Shah
> >
> >
>
>

Re: Plugin dependancies do not get added to classpath while running Nutch in local mode

Reply via email to