[ https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13193042#comment-13193042 ]
Ferdy Galema commented on NUTCH-1253: ------------------------------------- Hi, Looking at the revision history it seems that 3 years ago the library actually WAS updated to 1.9.11, whereafter a few months later is was reverted to 0.9.4 and later on to 0.9.5 but the plugin version remained at 1.9.11. The fact that they bothered to change this version number in the first place is pretty curious in itself, because most plugins simply remain at version 1.0 despite several changes. Not that it matters, but just to indicate that this number has no real purpose. As to nekohtml jar, am not sure why it's still at this specific version, or why it is the preferred setting. Digging up the issues or mailing lists might give you some more info about this. It might be worth looking into tagsoup. I do find your AbstractMethodError curious though. Are you sure it's because of nekohtml and xerces? Can you provide a stracktrace? > Incompatible neko and xerces versions > ------------------------------------- > > Key: NUTCH-1253 > URL: https://issues.apache.org/jira/browse/NUTCH-1253 > Project: Nutch > Issue Type: Bug > Affects Versions: 1.4 > Environment: Ubuntu 10.04 > Reporter: Dennis Spathis > > The Nutch 1.4 distribution includes > - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib- > nekohtml) > - xercesImpl-2.9.1.jar (under .../runtime/local/lib) > These two JARs appear to be incompatible versions. When the HtmlParser > (configured to use neko) is invoked during a local-mode crawl, the parse > fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, > rebuild the HtmlParser plugin and add a > catch(Throwable) clause in the getParse method to log the stacktrace.) > I found that substituting a later, compatible version of nekohtml (1.9.11) > fixes the problem. > Curiously, and in support of the above, the nekohtml plugin.xml file in > Nutch 1.4 contains the following: > <plugin > id="lib-nekohtml" > name="CyberNeko HTML Parser" > version="1.9.11" > provider-name="org.cyberneko"> > <runtime> > <library name="nekohtml-0.9.5.jar"> > <export name="*"/> > </library> > </runtime> > </plugin> > Note the conflicting version numbers (version tag is "1.9.11" but the > specified library is "nekohtml-0.9.5.jar"). > Was the 0.9.5 version included by mistake? Was the intention rather to > include 1.9.11? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira