[ 
https://issues.apache.org/jira/browse/NUTCH-696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675173#action_12675173
 ] 

Doğacan Güney commented on NUTCH-696:
-------------------------------------

This makes perfect sense, but I am not sure how to implement it. We can push 
parsing to a different thread and kill the thread after a while but IIRC 
forcefully shutting down threads is a looked down practice in java. Maybe we 
can push parsing to another process on the same machine and kill the process. 
This is cleaner but is more difficult to implement.

Do you have a suggestion on how to implement the timeout mechanism?

> Timeout for Parser
> ------------------
>
>                 Key: NUTCH-696
>                 URL: https://issues.apache.org/jira/browse/NUTCH-696
>             Project: Nutch
>          Issue Type: Wish
>          Components: fetcher
>            Reporter: julien nioche
>            Priority: Minor
>
> I found that the parsing sometimes crashes due to a problem on a specific 
> document, which is a bit of a shame as this blocks the rest of the segment 
> and Hadoop ends up finding that the node does not respond. I was wondering 
> about whether it would make sense to have a timeout mechanism for the parsing 
> so that if a document is not parsed after a time t, it is simply treated as 
> an exception and we can get on with the rest of the process.
> Does that make sense? Where do you think we should implement that, in 
> ParseUtil?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to