[jira] [Commented] (IMPALA-9792) Split Kudu scan ranges into smaller chunks for greater paralellelism

ASF subversion and git services (Jira) Fri, 09 Oct 2020 19:02:16 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-9792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211507#comment-17211507
 ]


ASF subversion and git services commented on IMPALA-9792:
---------------------------------------------------------

Commit 2fd6f5bc5aa6b50e36547e52657c1117637384b6 in impala's branch 
refs/heads/master from Bikramjeet Vig
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=2fd6f5b ]

IMPALA-9792: Add ability to split kudu scan ranges

This patch adds the ability to split kudu scan token via the provided
kudu java API. A query option "TARGETED_KUDU_SCAN_RANGE_LENGTH" has
been added to set the scan range length used in this implementation.

Potential benefit:
This helps increase parallelism during scanning which can
result in more efficient use of CPU with higher mt_dop.

Limitation:
- The scan range length sent to kudu is just a hint and does not
  guarantee that the token will be split at that limit.
- Comes at an added cost of an RPC to tablet server per token in
  order to split it. A slow tablet server which can already slow
  down scanning during execution can now also potentially slow
  down planning.
- Also adds the cost of an RPC per token to open a new scanner for
  it on the kudu side. Therefore, scanning many smaller split
  tokens can slow down scanning and we can also lose benefits
  of scanning a single large token sequentially with a single scanner.

Testing:
- Added an e2e test

Change-Id: Ia02fd94cc1d13c61bc6cb0765dd2cbe90e9a5ce8
Reviewed-on: http://gerrit.cloudera.org:8080/16385
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Split Kudu scan ranges into smaller chunks for greater paralellelism
> --------------------------------------------------------------------
>
>                 Key: IMPALA-9792
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9792
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Tim Armstrong
>            Assignee: Bikramjeet Vig
>            Priority: Major
>              Labels: kudu, multithreading
>
> We currently use one thread to scan each tablet, which may underparallelise 
> queries in many cases. Kudu added an API in KUDU-2437 and KUDU-2670 to split 
> tokens at a finer granularity.
> See 
> https://github.com/apache/kudu/commit/22a6faa44364dec3a171ec79c15b814ad9277d8f#diff-a4afa9dba99c7612b2cb9176134ff2b0
> The major downside is that the planner has to do an extra RPC to a tserver 
> for each tablet being scanned in order to figure out key range splits. Maybe 
> we can tie this to mt_dop >= 2, or use some heuristics to avoid these RPCs 
> for smaller tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-9792) Split Kudu scan ranges into smaller chunks for greater paralellelism

Reply via email to