[
https://issues.apache.org/jira/browse/IMPALA-9124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102628#comment-17102628
]
Sahil Takiar commented on IMPALA-9124:
--------------------------------------
Disclaimer: while this feature is referred to as *transparent* query retries,
clients may see some unexpected behavior when a query is retried. The retry
will not be 100% transparent to the end client, there will be some differences
that requires client-awareness of query retries:
* When a query is retried, the retry is modeled as a brand new query with a new
query id - which will be distinct from the query id of the originally submitted
query that ultimately failed
* Since a query retry is a brand new query, that query has its own runtime
profile as well - the runtime profiles of the failed and retried queries will
be linked together
* When requesting a runtime profile from the ImpalaService, the
GetRuntimeProfile() method will always return the profile of the latest query
attempt - there are plans to add new options to the ImpalaService interface so
that users can fetch the profiles of failed attempts as well; this does not
apply for the web ui - users can still get all query profiles (for both failed
and retried queries) from the web ui
> Transparently retry queries that fail due to cluster membership changes
> -----------------------------------------------------------------------
>
> Key: IMPALA-9124
> URL: https://issues.apache.org/jira/browse/IMPALA-9124
> Project: IMPALA
> Issue Type: New Feature
> Components: Backend, Clients
> Reporter: Sahil Takiar
> Assignee: Sahil Takiar
> Priority: Critical
> Attachments: Impala Transparent Query Retries.pdf
>
>
> Currently, if the Impala Coordinator or any Executors run into errors during
> query execution, Impala will fail the entire query. It would improve user
> experience to transparently retry the query for some transient, recoverable
> errors.
> This JIRA focuses on retrying queries that would otherwise fail due to
> cluster membership changes. Specifically, node failures that cause changes in
> the cluster membership (currently the Coordinator cancels all queries running
> on a node if it detects that the node is no longer part of the cluster) and
> node blacklisting (the Coordinator blacklists a node because it detects a
> problem with that node - can’t execute RPCs against the node). It is not
> focused on retrying general errors (e.g. any frontend errors,
> MemLimitExceeded exceptions, etc.).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]