[ https://issues.apache.org/jira/browse/NUTCH-709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679549#action_12679549 ]
Julien Nioche commented on NUTCH-709: ------------------------------------- Hi Tim, did you have a look at the logs to see which URL was causing the problem in the first place? do you specify a custom max length for the content to be fetched? Martina's example above is 428.42 kB which is far beyond the default max length, I am wondering whether the problem you found could be related to the fact that long documents are trimmed to the max length. > JSParseFilter gets into an infinate loop and ets all the stack > --------------------------------------------------------------- > > Key: NUTCH-709 > URL: https://issues.apache.org/jira/browse/NUTCH-709 > Project: Nutch > Issue Type: Bug > Components: fetcher > Affects Versions: 1.0.0 > Environment: Hadoop 0.19.0 running nutch trunk > Reporter: Tim Hawkins > Attachments: JSParseFilter.error.patch > > > When crawling pages with seperate fetch and parse, I see processes die > becuase of stack overflow. > Output is generaly. > java.lang.StackOverflowError > at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java:146) > at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java:148) > at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java:148) > at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java:148) > at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java:148) > at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java:148) > at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java:148) > at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java:148) > at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java:148) > at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java:148) > at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java:148) > at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java:148) > at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java:148) > at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java:148) > at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java:148) > at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java:148) > at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java:148) > at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java:148) > at org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java:148) > Inspection of the code shows that this is a recursive call to walk(.....) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.