[ 
https://issues.apache.org/jira/browse/HIVE-480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718911#action_12718911
 ] 

Joydeep Sen Sarma commented on HIVE-480:
----------------------------------------

one concern i have is that if the cluster goes down temporarily - then the 
retries will fail promptly and this fix would serve no purpose.

on the other hand - if the failure is due to genuine problems with the job 
(like problems in user scripts or bad input etc.) - then we will try this 
unnecessarily and cause excess load.

we need to think about how to distinguish these cases. in some cases 
(interactive cli session) - it may be better to leave the decision to user 
(give a prompt and ask the user whether they want to retry the job).

ideally - we should be able to do something like this for a non-interactive 
session as well - but that seems much more complicated (suspending and resuming 
a query given a queryid)

> allow option to retry map-reduce tasks
> --------------------------------------
>
>                 Key: HIVE-480
>                 URL: https://issues.apache.org/jira/browse/HIVE-480
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>         Attachments: HIVE-480.1.patch
>
>
> for long running queries with multiple map-reduce jobs - this should help in 
> dealing with any transient cluster failures without having to re-running all 
> the tasks.
> ideally - the entire plan can be serialized out and the actual process of 
> executing the workflow can be left to a pluggable workflow execution engine 
> (since this is a problem that has been solved many times already).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to