Thank you Sebastian for your response.

I followed the steps as per your suggestion and added the required jars
under runtime in plugin.xml. My code is at - https://github.com/
sujen1412/nutch/blob/kafka/src/plugin/publish-kafka/plugin.xml.

Now after compiling and running ./bin/crawl in local mode, the fetch job
fails due to

Caused by: org.apache.kafka.common.config.ConfigException: Invalid value
org.apache.kafka.clients.producer.internals.DefaultPartitioner for
configuration partitioner.class: Class org.apache.kafka.clients.
producer.internals.DefaultPartitioner could not be found.

Am I missing something ?

To find out the cause for this, I copied the jars from the
runtime/local/plugin/<some-plugin>/*.jar to the runtime/local/lib
directory, the code seems to work perfectly fine, which may imply that the
jars listed under the runtime tag in plugin.xml are not getting added to
classpath during runtime.

I looked at the code here https://github.com/apache/nutch/blob/master/src/
bin/nutch#L155-L164 and cannot understand the use of lines 161-163, if the
plugins folder is found add the home directory to the classpath ?
Looking into to various ways to set a classpath (
https://docs.oracle.com/javase/8/docs/technotes/tools/windows/classpath.html#A1100762),
it says that subdirectories are not searched recursively.

Thanks once again for your help.


On Wed, Sep 14, 2016 at 12:10 AM, Sebastian Nagel <
wastl.na...@googlemail.com> wrote:

> Hi Sujen,
>
> are the jars also listed in the plugin.xml?
>
> That's required. The plugin-specific ivy.xml is only used at compile time
> to fetch the library and its dependencies and get the plugin compiled.
>
> At runtime all required libs have to be listed in the plugin.xml, e.g.,
> https://github.com/apache/nutch/blob/master/src/plugin/
> parse-tika/plugin.xml
>
> This double work is not ideal and a frequent cause for errors but that's
> how it works right now.
>
> Cheers,
> Sebastian
>
>
> On 09/12/2016 11:56 PM, Sujen Shah wrote:
> > Hi Devs,
> >
> > I am facing issues in loading jars required for plugins while running
> Nutch in local mode.
> >
> > I am doing the following :
> > 1. add a dependency in <some-plugin>/ivy.xml
> > 2. ant clean runtime
> >
> > Now, when I print the classpath before running, the /bin/nutch script
> does not seem to be adding
> > those jars on to the classpath and throws runtime exceptions. To
> mitigate this I added the
> > dependency in the root ivy.xml.
> >
> > I don't know if I am missing something here or anyone else has faced the
> same issue and found a
> > solution.
> > For example - https://github.com/apache/nutch/tree/master/src/plugin/
> publish-rabbitmq, the
> > dependency for amqp-client had to be added in the root ivy.xml as well
> for it to not throw runtime
> > exceptions (ex - ClassNotFound)
> >
> > I have a created a patch which modifies the ./bin/nutch script to load
> the plugin jars onto the
> > classpath which is attached below. This patch eliminates the need to
> modify the root ivy.xml for
> > plugin specific dependencies.
> >
> > I wanted to ask the devs first if there was already a solution before
> filing a JIRA issue. If not,
> > I'll submit it through JIRA.
> >
> > Thank you for your help.
> >
> >
> > Regards,
> > Sujen Shah
>
>

Reply via email to