[ https://issues.apache.org/jira/browse/NUTCH-497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508083 ]
Hudson commented on NUTCH-497: ------------------------------ Integrated in Nutch-Nightly #129 (See [http://lucene.zones.apache.org:8080/hudson/job/Nutch-Nightly/129/]) > Extreme Nested Tags causes StackOverflowException in DomContentUtils...Spider > Trap > ---------------------------------------------------------------------------------- > > Key: NUTCH-497 > URL: https://issues.apache.org/jira/browse/NUTCH-497 > Project: Nutch > Issue Type: Bug > Components: fetcher > Affects Versions: 0.8.1, 0.9.0, 1.0.0 > Environment: all > Reporter: Dennis Kubes > Assignee: Dennis Kubes > Fix For: 1.0.0 > > Attachments: ExtremeNestedTags.patch, nested-tags-trap.patch, > nested-tags-trap2.patch, nested-tags-trap3.patch > > > Some webpages have a form of a spider trap that causes a > StackOverflowException in DomContentUtils by having nested tags with > thousands of layers deep. DomContentUtils when trying to get outlinks uses a > recursive method to parse the html. With this type of nesting it errors out. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.