Incompatible neko and xerces versions
-------------------------------------
Key: NUTCH-1253
URL: https://issues.apache.org/jira/browse/NUTCH-1253
Project: Nutch
Issue Type: Bug
Affects Versions: 1.4
Environment: Ubuntu 10.04
Reporter: Dennis Spathis
The Nutch 1.4 distribution includes
- nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib-
nekohtml)
- xercesImpl-2.9.1.jar (under .../runtime/local/lib)
These two JARs appear to be incompatible versions. When the HtmlParser
(configured to use neko) is invoked during a local-mode crawl, the parse fails
due to an AbstractMethodError. (Note: To see the AbstractMethodError, rebuild
the HtmlParser plugin and add a
catch(Throwable) clause in the getParse method to log the stacktrace.)
I found that substituting a later, compatible version of nekohtml (1.9.11)
fixes the problem.
Curiously, and in support of the above, the nekohtml plugin.xml file in
Nutch 1.4 contains the following:
<plugin
id="lib-nekohtml"
name="CyberNeko HTML Parser"
version="1.9.11"
provider-name="org.cyberneko">
<runtime>
<library name="nekohtml-0.9.5.jar">
<export name="*"/>
</library>
</runtime>
</plugin>
Note the conflicting version numbers (version tag is "1.9.11" but the
specified library is "nekohtml-0.9.5.jar").
Was the 0.9.5 version included by mistake? Was the intention rather to
include 1.9.11?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira