[ 
https://issues.apache.org/jira/browse/ANY23-58?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236501#comment-13236501
 ] 

Michele Mostarda commented on ANY23-58:
---------------------------------------

Issue has been reproduced:

It seems a Xerces related problem. Investigating.

{code}
Mar 23, 2012 11:34:40 AM org.apache.any23.rdf.PopularPrefixes getPrefixes
INFO: Loading prefixes from /org/apache/any23/prefixes/prefixes.properties
Mar 23, 2012 11:34:40 AM org.apache.any23.extractor.SingleDocumentExtraction run
INFO: Processing http://bob.example.com/

java.lang.OutOfMemoryError: Java heap space
        at org.apache.xerces.dom.NodeImpl.cloneNode(Unknown Source)
        at org.apache.xerces.dom.ChildNode.cloneNode(Unknown Source)
        at org.apache.xerces.dom.ParentNode.cloneNode(Unknown Source)
        at org.apache.xerces.dom.ElementImpl.cloneNode(Unknown Source)
        at org.apache.xerces.dom.ParentNode.cloneNode(Unknown Source)
        at org.apache.xerces.dom.ElementImpl.cloneNode(Unknown Source)
        at org.apache.xerces.dom.ParentNode.cloneNode(Unknown Source)
        at org.apache.xerces.dom.ElementImpl.cloneNode(Unknown Source)
        at org.apache.html.dom.HTMLTableRowElementImpl.cloneNode(Unknown Source)
        at org.apache.xerces.dom.ParentNode.cloneNode(Unknown Source)
        at org.apache.xerces.dom.ElementImpl.cloneNode(Unknown Source)
        at org.apache.html.dom.HTMLTableSectionElementImpl.cloneNode(Unknown 
Source)
        at org.apache.xerces.dom.ParentNode.cloneNode(Unknown Source)
        at org.apache.xerces.dom.ElementImpl.cloneNode(Unknown Source)
        at org.apache.html.dom.HTMLTableElementImpl.cloneNode(Unknown Source)
        at org.apache.xerces.dom.ParentNode.cloneNode(Unknown Source)
        at org.apache.xerces.dom.ElementImpl.cloneNode(Unknown Source)
        at org.apache.xerces.dom.ParentNode.cloneNode(Unknown Source)
        at org.apache.xerces.dom.ElementImpl.cloneNode(Unknown Source)
        at org.apache.xerces.dom.ParentNode.cloneNode(Unknown Source)
        at org.apache.xerces.dom.ElementImpl.cloneNode(Unknown Source)
        at org.apache.xerces.dom.ParentNode.cloneNode(Unknown Source)
        at org.apache.xerces.dom.ElementImpl.cloneNode(Unknown Source)
        at org.apache.xerces.dom.ParentNode.cloneNode(Unknown Source)
        at org.apache.xerces.dom.ElementImpl.cloneNode(Unknown Source)
        at org.apache.xerces.dom.ParentNode.cloneNode(Unknown Source)
        at org.apache.xerces.dom.ElementImpl.cloneNode(Unknown Source)
        at org.apache.xerces.dom.ParentNode.cloneNode(Unknown Source)
        at org.apache.xerces.dom.ElementImpl.cloneNode(Unknown Source)
        at org.apache.xerces.dom.ParentNode.cloneNode(Unknown Source)
        at org.apache.xerces.dom.ElementImpl.cloneNode(Unknown Source)
        at org.apache.xerces.dom.ParentNode.cloneNode(Unknown Source)
{code}
                
> HCardExtractor infinite loop and memory exhaustion
> --------------------------------------------------
>
>                 Key: ANY23-58
>                 URL: https://issues.apache.org/jira/browse/ANY23-58
>             Project: Apache Any23
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>         Environment: OpenJDK Runtime Environment (IcedTea6 1.11pre) 
> (6b23~pre11-0ubuntu1.11.10.2) on Ubuntu
>            Reporter: Hannes Mühleisen
>            Assignee: Michele Mostarda
>         Attachments: HCardFailTest.java, fail.html
>
>
> The HCardExtractor creates an infinite loop which will lead to memory 
> exhaustion in the method fixIncludes(), specifically in the line 
> node.appendChild(header.cloneNode(true)); on some HTML files. Attached is a 
> test case and an example HTML file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to