Timeout for Parser
------------------

                 Key: NUTCH-696
                 URL: https://issues.apache.org/jira/browse/NUTCH-696
             Project: Nutch
          Issue Type: Wish
          Components: fetcher
            Reporter: julien nioche
            Priority: Minor


I found that the parsing sometimes crashes due to a problem on a specific 
document, which is a bit of a shame as this blocks the rest of the segment and 
Hadoop ends up finding that the node does not respond. I was wondering about 
whether it would make sense to have a timeout mechanism for the parsing so that 
if a document is not parsed after a time t, it is simply treated as an 
exception and we can get on with the rest of the process.

Does that make sense? Where do you think we should implement that, in ParseUtil?


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to