[
https://issues.apache.org/jira/browse/NUTCH-696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885260#action_12885260
]
Julien Nioche edited comment on NUTCH-696 at 7/5/10 11:13 AM:
--------------------------------------------------------------
+1 : this is definitely useful. Hopefully the underlying parsers in Tika are
constantly improved to prevent loops and crashes but having the parser timeout
on top would be great
I suggest we mark it for 2.0 and 1.2
was (Author: jnioche):
+1 : this is definitely useful. Hopefully the underlying parsers in Tika
are constantly improved to prevent loops and crashes but having the parser
timeout on top would be great
> Timeout for Parser
> ------------------
>
> Key: NUTCH-696
> URL: https://issues.apache.org/jira/browse/NUTCH-696
> Project: Nutch
> Issue Type: Wish
> Components: fetcher
> Reporter: Julien Nioche
> Priority: Minor
> Attachments: timeout.patch
>
>
> I found that the parsing sometimes crashes due to a problem on a specific
> document, which is a bit of a shame as this blocks the rest of the segment
> and Hadoop ends up finding that the node does not respond. I was wondering
> about whether it would make sense to have a timeout mechanism for the parsing
> so that if a document is not parsed after a time t, it is simply treated as
> an exception and we can get on with the rest of the process.
> Does that make sense? Where do you think we should implement that, in
> ParseUtil?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.