[ 
https://issues.apache.org/jira/browse/LENS-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15050634#comment-15050634
 ] 

Rajat Khandelwal commented on LENS-743:
---------------------------------------

After some offline discussions, I'm inclined towards this approach:

Define a concept of Attempt for queries. LensQuery is the user-facing class for 
a query. Right now it contains fields for one attempt:

{noformat}
  private QueryStatus status;

  private String driverOpHandle;

  private long driverStartTime;
 
  private long driverFinishTime;

{noformat}

These will be extracted out in a class `Attempt` and LensQuery will not contain 
a list of attempts. 

>From lens database side:
There will be a separate table for attempts, where all attempts of queries will 
be stored. The query ultimately finishes when the last attempt finishes, and 
finished_query table already have fields belonging to attempt, so I'm thinking 
last attempt details can go to that table. Though we'll need to add a column 
"number of attempts" in finished_queries. 

Similar changes will be done in QueryContext. 


On query execution side, updateFinishedQuery will check FAILURE and make a 
decision on whether to retry or not. A retry will consist of launching the 
query again on the driver and creating another attempt. In this process, the 
previous retry will be saved to db.  
If a decision is made to not-retry the query, it'll follow the normal code 
path. 

[~amareshwari] [~Puneetkgupta] Please add if I missed anything. I'll do the 
same. 

> Query failure retries for transient errors
> ------------------------------------------
>
>                 Key: LENS-743
>                 URL: https://issues.apache.org/jira/browse/LENS-743
>             Project: Apache Lens
>          Issue Type: Improvement
>          Components: server
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Rajat Khandelwal
>
> There have to be retries for query failures for transient errors like network 
> errors (Hive server not reachable/ Metastore not reachable/ DB not 
> reachable). Retries should be available for each phase - submission, 
> execution, updating status, fetching results and formatting.
> Right now, any such failure results in marking query as failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to