[ 
https://issues.apache.org/jira/browse/IMPALA-9225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175466#comment-17175466
 ] 

Quanlong Huang commented on IMPALA-9225:
----------------------------------------

There is another solution that doesn't depend on IMPALA-10065: defering the 
time when the query marks "rows available". So it's still in RUNNING state 
until we spool all results or the result queue is full. Here are more details.

*Design2*
 * If retry_failed_queries and safely_retry_queries are true, FE sets the 
ResourceProfile for the plan root sink (same as SPOOL_QUERY_RESULTS is true).
 * When retry_failed_queries and safely_retry_queries are true, don't update 
opened_promise_ of the coordinator fragment instance in 
FragmentInstanceState::Exec() until we are able to provide results, e.g. all 
results spooled or queue full.
 * To do this, we needs to extend the DataSink::Send() interface to pass in a 
reference of the opened_promise_. It's set when the sender is blocked.

> Retryable queries should spool all results before returning any to the client
> -----------------------------------------------------------------------------
>
>                 Key: IMPALA-9225
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9225
>             Project: IMPALA
>          Issue Type: Sub-task
>            Reporter: Sahil Takiar
>            Assignee: Quanlong Huang
>            Priority: Critical
>
> If query retries are enabled, a query should not return any results to the 
> client until all results are spooled. The issue is that once a query starts 
> returning results, retrying the query becomes increasingly complex and is not 
> supported in the initial version of IMPALA-9124. Retrying a query while 
> returning results could cause incorrect results, especially for 
> non-deterministic queries (e.g. when the results are not ordered).
> Since a query can fail anytime while results are being produced, transparent 
> retries are most effective if they can be done during any point of query 
> execution.
> The one edge case is what happens if all query results cannot be contained in 
> the allocated result spooling memory (including unpinned memory). In this 
> case, retries for the query should be transparently disabled.
> We should consider making this configurable, in case it leads to performance 
> degradation. Although, I'm inclined to turn the flag on by default (e.g. 
> always spool all returns before returning them), otherwise (depending on the 
> query) query retries won't always be helpful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to