Timeout for Parser ------------------ Key: NUTCH-696 URL: https://issues.apache.org/jira/browse/NUTCH-696 Project: Nutch Issue Type: Wish Components: fetcher Reporter: julien nioche Priority: Minor
I found that the parsing sometimes crashes due to a problem on a specific document, which is a bit of a shame as this blocks the rest of the segment and Hadoop ends up finding that the node does not respond. I was wondering about whether it would make sense to have a timeout mechanism for the parsing so that if a document is not parsed after a time t, it is simply treated as an exception and we can get on with the rest of the process. Does that make sense? Where do you think we should implement that, in ParseUtil? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.