[
https://issues.apache.org/jira/browse/IMPALA-9225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175466#comment-17175466
]
Quanlong Huang commented on IMPALA-9225:
----------------------------------------
There is another solution that doesn't depend on IMPALA-10065: defering the
time when the query marks "rows available". So it's still in RUNNING state
until we spool all results or the result queue is full. Here are more details.
*Design2*
* If retry_failed_queries and safely_retry_queries are true, FE sets the
ResourceProfile for the plan root sink (same as SPOOL_QUERY_RESULTS is true).
* When retry_failed_queries and safely_retry_queries are true, don't update
opened_promise_ of the coordinator fragment instance in
FragmentInstanceState::Exec() until we are able to provide results, e.g. all
results spooled or queue full.
* To do this, we needs to extend the DataSink::Send() interface to pass in a
reference of the opened_promise_. It's set when the sender is blocked.
> Retryable queries should spool all results before returning any to the client
> -----------------------------------------------------------------------------
>
> Key: IMPALA-9225
> URL: https://issues.apache.org/jira/browse/IMPALA-9225
> Project: IMPALA
> Issue Type: Sub-task
> Reporter: Sahil Takiar
> Assignee: Quanlong Huang
> Priority: Critical
>
> If query retries are enabled, a query should not return any results to the
> client until all results are spooled. The issue is that once a query starts
> returning results, retrying the query becomes increasingly complex and is not
> supported in the initial version of IMPALA-9124. Retrying a query while
> returning results could cause incorrect results, especially for
> non-deterministic queries (e.g. when the results are not ordered).
> Since a query can fail anytime while results are being produced, transparent
> retries are most effective if they can be done during any point of query
> execution.
> The one edge case is what happens if all query results cannot be contained in
> the allocated result spooling memory (including unpinned memory). In this
> case, retries for the query should be transparently disabled.
> We should consider making this configurable, in case it leads to performance
> degradation. Although, I'm inclined to turn the flag on by default (e.g.
> always spool all returns before returning them), otherwise (depending on the
> query) query retries won't always be helpful.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]