[
https://issues.apache.org/jira/browse/HIVE-480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718911#action_12718911
]
Joydeep Sen Sarma commented on HIVE-480:
----------------------------------------
one concern i have is that if the cluster goes down temporarily - then the
retries will fail promptly and this fix would serve no purpose.
on the other hand - if the failure is due to genuine problems with the job
(like problems in user scripts or bad input etc.) - then we will try this
unnecessarily and cause excess load.
we need to think about how to distinguish these cases. in some cases
(interactive cli session) - it may be better to leave the decision to user
(give a prompt and ask the user whether they want to retry the job).
ideally - we should be able to do something like this for a non-interactive
session as well - but that seems much more complicated (suspending and resuming
a query given a queryid)
> allow option to retry map-reduce tasks
> --------------------------------------
>
> Key: HIVE-480
> URL: https://issues.apache.org/jira/browse/HIVE-480
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Joydeep Sen Sarma
> Attachments: HIVE-480.1.patch
>
>
> for long running queries with multiple map-reduce jobs - this should help in
> dealing with any transient cluster failures without having to re-running all
> the tasks.
> ideally - the entire plan can be serialized out and the actual process of
> executing the workflow can be left to a pluggable workflow execution engine
> (since this is a problem that has been solved many times already).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.