[jira] [Commented] (IMPALA-9199) Add support for single query retries on cluster membership changes

ASF subversion and git services (Jira) Thu, 09 Jul 2020 09:10:20 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-9199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17154698#comment-17154698
 ]


ASF subversion and git services commented on IMPALA-9199:
---------------------------------------------------------

Commit 65722d3e9051d6a08cb1e69fd36a06684745c226 in impala's branch 
refs/heads/master from Sahil Takiar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=65722d3 ]

IMPALA-9855, IMPALA-9854: Fix query retry TSAN errors

Fixed two warnings reported by TSAN. One of them is a data race on
exec_request_. The other is a lock-inversion warning.

The data race on exec_request_ is present because the same exec_request_
is used for the original and retried query. The issue is when a query is
retried it needs to set a new query id in the exec_request_, so retrying
a query requires mutating the exec_request_. However, even after a query
is retried the exec_request_ can still be accessed for the original
query. Specifically, an exec status report from a fragment can still be
propagated even after the query has been cancelled (e.g.
ControlService::ReportExecStatus can still access the original
exec_request_). IMPALA-9199 attempted to be smart about re-using the
TExecRequest for retried queries, but given the race conditions it seems
like a pre-mature optimization. We can re-visit IMPALA-9502 if we think
it actually makes a perf difference.

The lock-inversion warning is between the sharded locks in
ShardedQueryMap (specifically the QueryDriverMap query_driver_map_ in
ImpalaServer) and QueryDriver::client_request_state_lock_. The correct
lock ordering is to acquire the sharded lock in query_driver_map_ and then
to acquire the client_request_state_lock_.
QueryDriver::RetryQueryFromThread reversed the ordering. I wasn't able
to actually produce a deadlock, but fixing the ordering issue was
trivial.

Testing:
* Ran core tests
* Manually checked that the query retry tests are TSAN clean

Change-Id: Ife2c7492524647bfd8f053dacbda4c553a64eb61
Reviewed-on: http://gerrit.cloudera.org:8080/16148
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Add support for single query retries on cluster membership changes
> ------------------------------------------------------------------
>
>                 Key: IMPALA-9199
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9199
>             Project: IMPALA
>          Issue Type: Sub-task
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>            Priority: Major
>             Fix For: Impala 4.0
>
>
> If the cluster membership changes (either because the statestore detects that 
> a node has left the cluster, or a node is added to the blacklist), then 
> rather than cancelling / failing queries running on the target node, retry 
> them.
> This JIRA focuses on just retrying queries once.
> There should be a query level option to control whether queries are retried 
> or not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-9199) Add support for single query retries on cluster membership changes

Reply via email to