[ https://issues.apache.org/jira/browse/NUTCH-696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676658#action_12676658 ]
julien nioche commented on NUTCH-696: ------------------------------------- I was thinking along the lines of your first option i.e do the parsing in a separate thread and kill it if we pass the time out. Am not the most experienced person when it comes to threads in Java so someone else will probably have a better idea. > Timeout for Parser > ------------------ > > Key: NUTCH-696 > URL: https://issues.apache.org/jira/browse/NUTCH-696 > Project: Nutch > Issue Type: Wish > Components: fetcher > Reporter: julien nioche > Priority: Minor > > I found that the parsing sometimes crashes due to a problem on a specific > document, which is a bit of a shame as this blocks the rest of the segment > and Hadoop ends up finding that the node does not respond. I was wondering > about whether it would make sense to have a timeout mechanism for the parsing > so that if a document is not parsed after a time t, it is simply treated as > an exception and we can get on with the rest of the process. > Does that make sense? Where do you think we should implement that, in > ParseUtil? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.