[ 
https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13248223#comment-13248223
 ] 

Ferdy Galema commented on NUTCH-1253:
-------------------------------------

Wow this issue keeps getting more and more interesting. I just found out that 
the exception is CAUSED BY enabling trace logging. That is why it is so 
confusing. My previous statement about it not affecting nutchgora is not true 
it seems. It indeed affects both trunk and nutchgora. See the following 
instructions for reproducing the problem:


ferdy@ftm:~/workspace/nutchtrunk/runtime/local$ bin/nutch parsechecker 
"http://www.iana.org/";
...
Version: 5
Status: success(1,0)
...


Now what happens when I add the following line to log4j.properties. (Note that 
the comment by Dennis has a type in this line).
log4j.logger.org.apache.nutch.parse.html=TRACE,cmdstdout

ferdy@ftm:~/workspace/nutchtrunk/runtime/local$ bin/nutch parsechecker 
"http://www.iana.org/";
...
Version: 5
Status: failed(2,200): org.apache.nutch.parse.ParseException: Unable to 
successfully parse content
...

So this is very obscure. It might be a trace logging statement that triggers 
the exception. It cannot be something else.
                
> Incompatible neko and xerces versions
> -------------------------------------
>
>                 Key: NUTCH-1253
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1253
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 1.4
>         Environment: Ubuntu 10.04
>            Reporter: Dennis Spathis
>         Attachments: NUTCH-1253-nutchgora.patch, NUTCH-1253.patch
>
>
> The Nutch 1.4 distribution includes
>  - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib-
> nekohtml)
>  - xercesImpl-2.9.1.jar (under .../runtime/local/lib)
> These two JARs appear to be incompatible versions. When the HtmlParser 
> (configured to use neko) is invoked during a local-mode crawl, the parse 
> fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, 
> rebuild the HtmlParser plugin and add a
> catch(Throwable) clause in the getParse method to log the stacktrace.)
> I found that substituting a later, compatible version of nekohtml (1.9.11)
> fixes the problem.
> Curiously, and in support of the above, the nekohtml plugin.xml file in
> Nutch 1.4 contains the following:
> <plugin
>    id="lib-nekohtml"
>    name="CyberNeko HTML Parser"
>    version="1.9.11"
>    provider-name="org.cyberneko">
>    <runtime>
>        <library name="nekohtml-0.9.5.jar">
>            <export name="*"/>
>        </library>
>    </runtime>
> </plugin>
> Note the conflicting version numbers (version tag is "1.9.11" but the
> specified library is "nekohtml-0.9.5.jar").
> Was the 0.9.5 version included by mistake? Was the intention rather to
> include 1.9.11?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to