[
https://issues.apache.org/jira/browse/IMPALA-9792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211507#comment-17211507
]
ASF subversion and git services commented on IMPALA-9792:
---------------------------------------------------------
Commit 2fd6f5bc5aa6b50e36547e52657c1117637384b6 in impala's branch
refs/heads/master from Bikramjeet Vig
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=2fd6f5b ]
IMPALA-9792: Add ability to split kudu scan ranges
This patch adds the ability to split kudu scan token via the provided
kudu java API. A query option "TARGETED_KUDU_SCAN_RANGE_LENGTH" has
been added to set the scan range length used in this implementation.
Potential benefit:
This helps increase parallelism during scanning which can
result in more efficient use of CPU with higher mt_dop.
Limitation:
- The scan range length sent to kudu is just a hint and does not
guarantee that the token will be split at that limit.
- Comes at an added cost of an RPC to tablet server per token in
order to split it. A slow tablet server which can already slow
down scanning during execution can now also potentially slow
down planning.
- Also adds the cost of an RPC per token to open a new scanner for
it on the kudu side. Therefore, scanning many smaller split
tokens can slow down scanning and we can also lose benefits
of scanning a single large token sequentially with a single scanner.
Testing:
- Added an e2e test
Change-Id: Ia02fd94cc1d13c61bc6cb0765dd2cbe90e9a5ce8
Reviewed-on: http://gerrit.cloudera.org:8080/16385
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Split Kudu scan ranges into smaller chunks for greater paralellelism
> --------------------------------------------------------------------
>
> Key: IMPALA-9792
> URL: https://issues.apache.org/jira/browse/IMPALA-9792
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Reporter: Tim Armstrong
> Assignee: Bikramjeet Vig
> Priority: Major
> Labels: kudu, multithreading
>
> We currently use one thread to scan each tablet, which may underparallelise
> queries in many cases. Kudu added an API in KUDU-2437 and KUDU-2670 to split
> tokens at a finer granularity.
> See
> https://github.com/apache/kudu/commit/22a6faa44364dec3a171ec79c15b814ad9277d8f#diff-a4afa9dba99c7612b2cb9176134ff2b0
> The major downside is that the planner has to do an extra RPC to a tserver
> for each tablet being scanned in order to figure out key range splits. Maybe
> we can tie this to mt_dop >= 2, or use some heuristics to avoid these RPCs
> for smaller tables.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]