Incompatible neko and xerces versions
-------------------------------------

                 Key: NUTCH-1253
                 URL: https://issues.apache.org/jira/browse/NUTCH-1253
             Project: Nutch
          Issue Type: Bug
    Affects Versions: 1.4
         Environment: Ubuntu 10.04
            Reporter: Dennis Spathis


The Nutch 1.4 distribution includes

 - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib-
nekohtml)
 - xercesImpl-2.9.1.jar (under .../runtime/local/lib)

These two JARs appear to be incompatible versions. When the HtmlParser 
(configured to use neko) is invoked during a local-mode crawl, the parse fails 
due to an AbstractMethodError. (Note: To see the AbstractMethodError, rebuild 
the HtmlParser plugin and add a
catch(Throwable) clause in the getParse method to log the stacktrace.)

I found that substituting a later, compatible version of nekohtml (1.9.11)
fixes the problem.

Curiously, and in support of the above, the nekohtml plugin.xml file in
Nutch 1.4 contains the following:

<plugin
   id="lib-nekohtml"
   name="CyberNeko HTML Parser"
   version="1.9.11"
   provider-name="org.cyberneko">

   <runtime>
       <library name="nekohtml-0.9.5.jar">
           <export name="*"/>
       </library>
   </runtime>
</plugin>

Note the conflicting version numbers (version tag is "1.9.11" but the
specified library is "nekohtml-0.9.5.jar").

Was the 0.9.5 version included by mistake? Was the intention rather to
include 1.9.11?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to