[ 
https://issues.apache.org/jira/browse/IMPALA-9225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17185773#comment-17185773
 ] 

ASF subversion and git services commented on IMPALA-9225:
---------------------------------------------------------

Commit 61dcc805e536af0f160225cc928aa188aa861225 in impala's branch 
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=61dcc80 ]

IMPALA-9225: Query option for retryable queries to spool all results before 
returning any to the client

If we have returned any results to the client in the original query,
query retry will be skipped to avoid incorrect results. This patch adds
a query option, spool_all_results_for_retries, for retryable queries to
spool all results before returning any to the client. It defaults to
true. If all query results cannot be contained in the allocated result
spooling space, we'll return results and thus disabled query retry on
the query.

Setting spool_all_results_for_retries to false will fallback to the
original behavior - client can fetch results when any of them are ready.
So we explicitly set it to false in the retried query since it won't be
retried. For non retryable queries or queries that don't enable results
spooling, the spool_all_results_for_retries option takes no effect.

To implement this, this patch defers the time when results are ready to
be fetched. By default, the “rows available” event happens when any
results are ready. For a retryable query, when spool_query_results and
spool_all_results_for_retries are both true, the “rows available” event
happens after all results are spooled or any errors stopping us to do
so, e.g. batch queue is full, cancellation or failures. After waiting
for the root fragment instance’s Open() finishes, the coordinator will
wait until results of BufferedPlanRootSink are ready.
BufferedPlanRootSink sets the results ready signal in its Send(),
Close(), Cancel(), FlushFinal() methods.

Tests:
- Add a test to verify that a retryable query will spool all its results
  when results spooling and spool_all_results_for_retries are enabled.
- Add a test to verify that query retry succeeds when a retryable query
  is still spooling its results (spool_all_results_for_retries=true).
- Add a test to verify that the retried query won't spool all results
  even when results spooling and spool_all_results_for_retries are
  enabled in the original query.
- Add a test to verify that the original query can be canceled
  correctly. We need this because the added logics for
  spool_all_results_for_retries are related to the cancellation code
  path.
- Add a test to verify results will be returned when all of them can't
  fit into the result spooling space, and query retry will be skipped.

Change-Id: I462dbfef9ddab9060b30a6937fca9122484a24a5
Reviewed-on: http://gerrit.cloudera.org:8080/16323
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Retryable queries should spool all results before returning any to the client
> -----------------------------------------------------------------------------
>
>                 Key: IMPALA-9225
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9225
>             Project: IMPALA
>          Issue Type: Sub-task
>            Reporter: Sahil Takiar
>            Assignee: Quanlong Huang
>            Priority: Critical
>
> If query retries are enabled, a query should not return any results to the 
> client until all results are spooled. The issue is that once a query starts 
> returning results, retrying the query becomes increasingly complex and is not 
> supported in the initial version of IMPALA-9124. Retrying a query while 
> returning results could cause incorrect results, especially for 
> non-deterministic queries (e.g. when the results are not ordered).
> Since a query can fail anytime while results are being produced, transparent 
> retries are most effective if they can be done during any point of query 
> execution.
> The one edge case is what happens if all query results cannot be contained in 
> the allocated result spooling memory (including unpinned memory). In this 
> case, retries for the query should be transparently disabled.
> We should consider making this configurable, in case it leads to performance 
> degradation. Although, I'm inclined to turn the flag on by default (e.g. 
> always spool all returns before returning them), otherwise (depending on the 
> query) query retries won't always be helpful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to