[jira] [Commented] (LENS-743) Query failure retries for transient errors

Puneet Gupta (JIRA) Wed, 09 Dec 2015 04:21:08 -0800

    [ 
https://issues.apache.org/jira/browse/LENS-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048596#comment-15048596
 ]


Puneet Gupta commented on LENS-743:
-----------------------------------

Good Question.
Adding to that ...

I would want a quick retry incase the failure happens early on (what is early 
on is questionable) .. say a person submitted a query and it was promoted to 
run only after 10 hours and then it fails within few seconds/mins due to 
transient failure 

But then if say a person submitted a query and it was promoted to run only 
after 10 hours and then it fails after 10 more hours, should we re run 
immediately.. from user perspective yes ... form lens perspective .. not sure

Another thing we need to consider is if we do a quick retry, the cause of 
transient failure may still be persisting. Should we wait and try ? How long 
should we wait ? Should all new queries also wait (because they ll fail 
anyway).. say in case the failures are coz Hive Server is clogged (excessive 
GC/etc)?

Another thought , may be we should have the first re run immediately (after 
some set wait time based on type of error) and the subsequent runs can have 
exponential wait times. 

 

> Query failure retries for transient errors
> ------------------------------------------
>
>                 Key: LENS-743
>                 URL: https://issues.apache.org/jira/browse/LENS-743
>             Project: Apache Lens
>          Issue Type: Improvement
>          Components: server
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Rajat Khandelwal
>
> There have to be retries for query failures for transient errors like network 
> errors (Hive server not reachable/ Metastore not reachable/ DB not 
> reachable). Retries should be available for each phase - submission, 
> execution, updating status, fetching results and formatting.
> Right now, any such failure results in marking query as failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (LENS-743) Query failure retries for transient errors

Reply via email to