[ 
https://issues.apache.org/jira/browse/IMPALA-9181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044668#comment-17044668
 ] 

ASF subversion and git services commented on IMPALA-9181:
---------------------------------------------------------

Commit 1e616774d4d3a00e002d1e383ccd89c46f6d9010 in impala's branch 
refs/heads/master from Thomas Tauber-Marshall
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=1e61677 ]

IMPALA-8712: Make ExecQueryFInstances async

This patch refactors the ExecQueryFInstances rpc to be asychronous.
Previously, Impala would issue all the Exec()s, wait for all of them
to complete, and then check if any of them resulted in an error. We
now stop issuing Exec()s and cancel any that are still in flight as
soon as an error occurs.

It also performs some cleanup around the thread safety of
Coordinator::BackendState, including adding comments and DCHECKS.

=== Exec RPC Thread Pool ===
This patch also removes the 'exec_rpc_thread_pool_' from ExecEnv. This
thread pool was used to partially simulate async Exec() prior to the
switch to KRPC, which provides built-in async rpc capabilities.

Removing this thread pool has potential performance implications, as
it means that the Exec() parameters are serialized in serialize rather
than in parallel (with the level of parallelism determined by the size
of the thread pool, which was configurable by an Advanced flag and
defaulted to 12).

To ensure we don't regress query startup times, I did some performance
testing. All tests were done on a 10 node cluster. The baseline used
for the tests did not include IMPALA-9181, a perf optimization for
query startup done to facilitate this work.

I ran TPCH 100 at concurrency levels of 1, 4, and 8 and extracted the
query startup times from the profiles. For each concurrency level, the
average regression in query startup time was < 2ms. Because query e2e
running time was much longer than this, there was no noticable change
in total query time.

I also ran a 'worst case scenario' with a table with 10,000 pertitions
to create a very large Exec() payload to serialize (~1.21MB vs.
~10KB-30KB for TPCH 100). Again, change in query startup time was
neglible.
============================

Testing:
- Added a e2e test that verifies that a query where an Exec() fails
  doesn't wait for all Exec()s to complete before cancelling and
  returning the error to the client.

Change-Id: I33ec96e5885af094c294cd3a76c242995263ba32
Reviewed-on: http://gerrit.cloudera.org:8080/15154
Reviewed-by: Thomas Tauber-Marshall <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Serialize TQueryCtx once per query
> ----------------------------------
>
>                 Key: IMPALA-9181
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9181
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 3.4.0
>            Reporter: Thomas Tauber-Marshall
>            Assignee: Thomas Tauber-Marshall
>            Priority: Major
>             Fix For: Impala 3.4.0
>
>
> When issuing Exec() rpcs to backends, we currently serialize the TQueryCtx 
> once per backend. This is inefficient as the TQueryCtx is the same for all 
> backends and really only needs to be serialized once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to