Hi Sujen,

could you send the complete stack trace? Just to be sure from where the error 
stems.

> I looked at the code here 
> https://github.com/apache/nutch/blob/master/src/bin/nutch#L155-L164
> <https://github.com/apache/nutch/blob/master/src/bin/nutch#L155-L164> and 
> cannot understand the use
> of lines 161-163, if the plugins folder is found add the home directory to 
> the classpath ?

In a local installation $NUTCH_HOME ("runtime/local") is added to the classpath 
because the folder
"plugins" defined in the property "plugin.folders" is located here 
("runtime/local/plugins"), see:

<property>
  <name>plugin.folders</name>
  <value>plugins</value>
  <description>Directories where nutch plugins are located.  Each
  element may be a relative or absolute path.  If absolute, it is used
  as is.  If relative, it is searched for on the classpath.</description>
</property>

See also my comments on https://github.com/apache/nutch/pull/152

Sebastian


On 09/23/2016 12:06 AM, Sujen Shah wrote:
> Thank you Sebastian for your response. 
> 
> I followed the steps as per your suggestion and added the required jars under 
> runtime in plugin.xml.
> My code is at - 
> https://github.com/sujen1412/nutch/blob/kafka/src/plugin/publish-kafka/plugin.xml
> <https://github.com/sujen1412/nutch/blob/kafka/src/plugin/publish-kafka/plugin.xml>.
> 
> Now after compiling and running ./bin/crawl in local mode, the fetch job 
> fails due to 
> 
> Caused by: org.apache.kafka.common.config.ConfigException: Invalid value
> org.apache.kafka.clients.producer.internals.DefaultPartitioner for 
> configuration partitioner.class:
> Class org.apache.kafka.clients.producer.internals.DefaultPartitioner could 
> not be found.
> 
> Am I missing something ? 
> 
> To find out the cause for this, I copied the jars from the 
> runtime/local/plugin/<some-plugin>/*.jar
> to the runtime/local/lib directory, the code seems to work perfectly fine, 
> which may imply that the
> jars listed under the runtime tag in plugin.xml are not getting added to 
> classpath during runtime. 
> 
> I looked at the code here 
> https://github.com/apache/nutch/blob/master/src/bin/nutch#L155-L164
> <https://github.com/apache/nutch/blob/master/src/bin/nutch#L155-L164> and 
> cannot understand the use
> of lines 161-163, if the plugins folder is found add the home directory to 
> the classpath ?
> Looking into to various ways to set a classpath
> (https://docs.oracle.com/javase/8/docs/technotes/tools/windows/classpath.html#A1100762),
>  it says
> that subdirectories are not searched recursively. 
> 
> Thanks once again for your help. 
> 
> 
> On Wed, Sep 14, 2016 at 12:10 AM, Sebastian Nagel <wastl.na...@googlemail.com
> <mailto:wastl.na...@googlemail.com>> wrote:
> 
>     Hi Sujen,
> 
>     are the jars also listed in the plugin.xml?
> 
>     That's required. The plugin-specific ivy.xml is only used at compile time
>     to fetch the library and its dependencies and get the plugin compiled.
> 
>     At runtime all required libs have to be listed in the plugin.xml, e.g.,
>     
> https://github.com/apache/nutch/blob/master/src/plugin/parse-tika/plugin.xml
>     
> <https://github.com/apache/nutch/blob/master/src/plugin/parse-tika/plugin.xml>
> 
>     This double work is not ideal and a frequent cause for errors but that's
>     how it works right now.
> 
>     Cheers,
>     Sebastian
> 
> 
>     On 09/12/2016 11:56 PM, Sujen Shah wrote:
>     > Hi Devs,
>     >
>     > I am facing issues in loading jars required for plugins while running 
> Nutch in local mode.
>     >
>     > I am doing the following :
>     > 1. add a dependency in <some-plugin>/ivy.xml
>     > 2. ant clean runtime
>     >
>     > Now, when I print the classpath before running, the /bin/nutch script 
> does not seem to be adding
>     > those jars on to the classpath and throws runtime exceptions. To 
> mitigate this I added the
>     > dependency in the root ivy.xml.
>     >
>     > I don't know if I am missing something here or anyone else has faced 
> the same issue and found a
>     > solution.
>     > For example - 
> https://github.com/apache/nutch/tree/master/src/plugin/publish-rabbitmq
>     
> <https://github.com/apache/nutch/tree/master/src/plugin/publish-rabbitmq>, the
>     > dependency for amqp-client had to be added in the root ivy.xml as well 
> for it to not throw runtime
>     > exceptions (ex - ClassNotFound)
>     >
>     > I have a created a patch which modifies the ./bin/nutch script to load 
> the plugin jars onto the
>     > classpath which is attached below. This patch eliminates the need to 
> modify the root ivy.xml for
>     > plugin specific dependencies.
>     >
>     > I wanted to ask the devs first if there was already a solution before 
> filing a JIRA issue. If not,
>     > I'll submit it through JIRA.
>     >
>     > Thank you for your help.
>     >
>     >
>     > Regards,
>     > Sujen Shah
> 
> 

Reply via email to