[
https://issues.apache.org/jira/browse/IMPALA-10065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Work on IMPALA-10065 started by Quanlong Huang.
-----------------------------------------------
> Hit DCHECK when retrying a query in FINISHED state
> --------------------------------------------------
>
> Key: IMPALA-10065
> URL: https://issues.apache.org/jira/browse/IMPALA-10065
> Project: IMPALA
> Issue Type: Sub-task
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Critical
>
> Queries will go into FINISHED state when rows are available, no matter
> whether the client has fetched any results. If the client hasn't called fetch
> on the query, the query should still be retryable. However, retrying such a
> query hit a DCHECK at
> https://github.com/apache/impala/blob/a0057788c5c2300f58b6615a27116b8331171e06/be/src/runtime/query-driver.cc#L131-L135
> This can be reproduce by modifying test_retries_from_cancellation_pool in
> tests/customer_test/test_query_retry.py:
> {code}
> diff --git a/tests/custom_cluster/test_query_retries.py
> b/tests/custom_cluster/test_query_retries.py
> index 54f2334..ae57068 100644
> --- a/tests/custom_cluster/test_query_retries.py
> +++ b/tests/custom_cluster/test_query_retries.py
> @@ -69,21 +69,23 @@ class TestQueryRetries(CustomClusterTestSuite):
> # The following query executes slowly, and does minimal TransmitData
> RPCs, so it is
> # likely that the statestore detects that the impalad has been killed
> before a
> # TransmitData RPC has occurred.
> - query = "select count(*) from functional.alltypes where bool_col =
> sleep(50)"
> + query = "select count(*) from functional.alltypestiny union all select
> count(*) from functional.alltypes where bool_col = sleep(50)"
>
> # Launch the query, wait for it to start running, and then kill an
> impalad.
> handle = self.execute_query_async(query,
> query_options={'retry_failed_queries': 'true'})
> - self.wait_for_state(handle, self.client.QUERY_STATES['RUNNING'], 60)
> + self.wait_for_state(handle, self.client.QUERY_STATES['FINISHED'], 60)
>
> # Kill a random impalad (but not the one executing the actual query).
> self.__kill_random_impalad()
> + time.sleep(10)
>
> # Validate the query results.
> results = self.client.fetch(query, handle)
> assert results.success
> - assert len(results.data) == 1
> - assert int(results.data[0]) == 3650
> + assert len(results.data) == 2
> + assert int(results.data[0]) == 8
> + assert int(results.data[1]) == 3650
>
> # Validate the live exec summary.
> retried_query_id = self.__get_retried_query_id_from_summary(handle)
> {code}
> The change choose another query that has two UNION operands. The query will
> be in FINISHED state after the first operand finishes. When we kill an
> impalad, the coordinator hit the DCHECK.
> We should support retrying a FINISHED (but actually running) query that
> hasn't returned any results. This is required by IMPALA-9225.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]