[ 
https://issues.apache.org/jira/browse/IMPALA-9792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17119137#comment-17119137
 ] 

Tim Armstrong commented on IMPALA-9792:
---------------------------------------

This seems to be enough to get it to generate more parallelism:
{noformat}
diff --git a/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java b/fe/s
rc/main/java/org/apache/impala/planner/KuduScanNode.java
index 9e17a32..0b2c271 100644
--- a/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
+++ b/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
@@ -261,6 +261,7 @@ public class KuduScanNode extends ScanNode {
     }
     KuduScanTokenBuilder tokenBuilder = client.newScanTokenBuilder(rpcTable);
     tokenBuilder.setProjectedColumnNames(projectedCols);
+    tokenBuilder.setSplitSizeBytes(1024L * 1024L * 8L);
     for (KuduPredicate predicate: kuduPredicates_) 
tokenBuilder.addPredicate(predicate);
     return tokenBuilder.build();
   }
{noformat}

{noformat}
[localhost:21000] default> set mt_dop=8; select count(*) from 
tpch_kudu.lineitem; summary;
MT_DOP set to 8
Query: select count(*) from tpch_kudu.lineitem
Query submitted at: 2020-05-28 15:21:01 (Coordinator: 
http://tarmstrong-box:25000)
Query progress can be monitored at: 
http://tarmstrong-box:25000/query_plan?query_id=5147b6d4ef07380c:db99791500000000
+----------+
| count(*) |
+----------+
| 6001215  |
+----------+
Fetched 1 row(s) in 6.56s
+---------------------+--------+-------+----------+----------+-------+------------+-----------+---------------+--------------------+
| Operator            | #Hosts | #Inst | Avg Time | Max Time | #Rows | Est. 
#Rows | Peak Mem  | Est. Peak Mem | Detail             |
+---------------------+--------+-------+----------+----------+-------+------------+-----------+---------------+--------------------+
| F01:ROOT            | 1      | 1     | 23.15us  | 23.15us  |       |          
  | 0 B       | 0 B           |                    |
| 03:AGGREGATE        | 1      | 1     | 330.88us | 330.88us | 1     | 1        
  | 16.00 KB  | 10.00 MB      | FINALIZE           |
| 02:EXCHANGE         | 1      | 1     | 253.10us | 253.10us | 18    | 1        
  | 152.00 KB | 16.00 KB      | UNPARTITIONED      |
| F00:EXCHANGE SENDER | 3      | 18    | 135.04us | 431.50us |       |          
  | 16.00 KB  | 0 B           |                    |
| 01:AGGREGATE        | 3      | 18    | 0ns      | 0ns      | 18    | 1        
  | 20.00 KB  | 10.00 MB      |                    |
| 00:SCAN KUDU        | 3      | 18    | 82.23ms  | 86.85ms  | 18    | 6.00M    
  | 0 B       | 384.00 KB     | tpch_kudu.lineitem |
+---------------------+--------+-------+----------+----------+-------+------------+-----------+---------------+--------------------+
[localhost:21000] default> show partitions tpch_kudu.lineitem;
Query: show partitions tpch_kudu.lineitem
+-----------+----------+-----------------+-----------+
| Start Key | Stop Key | Leader Replica  | #Replicas |
+-----------+----------+-----------------+-----------+
|           | 00000001 | 127.0.0.1:31202 | 3         |
| 00000001  | 00000002 | 127.0.0.1:31201 | 3         |
| 00000002  | 00000003 | 127.0.0.1:31200 | 3         |
| 00000003  | 00000004 | 127.0.0.1:31200 | 3         |
| 00000004  | 00000005 | 127.0.0.1:31201 | 3         |
| 00000005  | 00000006 | 127.0.0.1:31202 | 3         |
| 00000006  | 00000007 | 127.0.0.1:31202 | 3         |
| 00000007  | 00000008 | 127.0.0.1:31202 | 3         |
| 00000008  |          | 127.0.0.1:31200 | 3         |
+-----------+----------+-----------------+-----------+
Fetched 9 row(s) in 0.03s
{noformat}

> Split Kudu scan ranges into smaller chunks for greater paralellelism
> --------------------------------------------------------------------
>
>                 Key: IMPALA-9792
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9792
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Tim Armstrong
>            Priority: Major
>              Labels: kudu, multithreading
>
> We currently use one thread to scan each tablet, which may underparallelise 
> queries in many cases. Kudu added an API in KUDU-2437 and KUDU-2670 to split 
> tokens at a finer granularity.
> See 
> https://github.com/apache/kudu/commit/22a6faa44364dec3a171ec79c15b814ad9277d8f#diff-a4afa9dba99c7612b2cb9176134ff2b0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to