[jira] [Commented] (IGNITE-16315) Calcite engine. Query start request contains a lot of data

Aleksey Plekhanov (Jira) Sun, 23 Jan 2022 22:18:07 -0800


    [ 
https://issues.apache.org/jira/browse/IGNITE-16315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480829#comment-17480829
 ]


Aleksey Plekhanov commented on IGNITE-16315:
--------------------------------------------

Benchmarking results on my laptop (1 client node, 3 server nodes, 1 thread).

Before optimizations:

 
{noformat}
Benchmark                                 (engine)   Mode  Cnt     Score      
Error  Units JmhSqlBenchmark.queryGroupBy                    H2  thrpt    3    
46,495 ±   25,221  ops/s JmhSqlBenchmark.queryGroupBy               CALCITE  
thrpt    3    50,354 ±    6,799  ops/s JmhSqlBenchmark.queryGroupByIndexed      
       H2  thrpt    3    49,043 ±   13,774  ops/s 
JmhSqlBenchmark.queryGroupByIndexed        CALCITE  thrpt    3    65,335 ±   
21,285  ops/s JmhSqlBenchmark.queryOrderByBatch               H2  thrpt    3  
1238,830 ±  521,193  ops/s JmhSqlBenchmark.queryOrderByBatch          CALCITE  
thrpt    3   706,558 ±  317,412  ops/s JmhSqlBenchmark.queryOrderByFull         
       H2  thrpt    3    19,009 ±    2,478  ops/s 
JmhSqlBenchmark.queryOrderByFull           CALCITE  thrpt    3    15,380 ±    
5,543  ops/s JmhSqlBenchmark.querySimpleBatch                H2  thrpt    3    
79,860 ±   16,487  ops/s JmhSqlBenchmark.querySimpleBatch           CALCITE  
thrpt    3    55,177 ±   15,951  ops/s JmhSqlBenchmark.querySimpleBatchIndexed  
       H2  thrpt    3  1973,364 ±  325,749  ops/s 
JmhSqlBenchmark.querySimpleBatchIndexed    CALCITE  thrpt    3   788,484 ±  
274,067  ops/s JmhSqlBenchmark.querySimpleUnique               H2  thrpt    3   
 77,317 ±   27,542  ops/s JmhSqlBenchmark.querySimpleUnique          CALCITE  
thrpt    3    58,770 ±   11,196  ops/s JmhSqlBenchmark.querySimpleUniqueIndexed 
       H2  thrpt    3  8462,273 ± 3255,088  ops/s 
JmhSqlBenchmark.querySimpleUniqueIndexed   CALCITE  thrpt    3  1270,796 ± 
1179,145  ops/s
{noformat}
After optimizations:

 

 
{noformat}
Benchmark                                 (engine)   Mode  Cnt     Score      
Error  Units
JmhSqlBenchmark.queryGroupBy                    H2  thrpt    3    50,589 ±    
7,986  ops/s
JmhSqlBenchmark.queryGroupBy               CALCITE  thrpt    3    52,923 ±    
8,995  ops/s
JmhSqlBenchmark.queryGroupByIndexed             H2  thrpt    3    51,276 ±   
19,417  ops/s
JmhSqlBenchmark.queryGroupByIndexed        CALCITE  thrpt    3    71,974 ±   
90,986  ops/s
JmhSqlBenchmark.queryOrderByBatch               H2  thrpt    3  1334,399 ±  
477,006  ops/s
JmhSqlBenchmark.queryOrderByBatch          CALCITE  thrpt    3  1276,147 ±  
436,042  ops/s
JmhSqlBenchmark.queryOrderByFull                H2  thrpt    3    17,768 ±    
3,904  ops/s
JmhSqlBenchmark.queryOrderByFull           CALCITE  thrpt    3    15,680 ±    
3,744  ops/s
JmhSqlBenchmark.querySimpleBatch                H2  thrpt    3    78,919 ±   
24,512  ops/s
JmhSqlBenchmark.querySimpleBatch           CALCITE  thrpt    3    57,916 ±   
15,917  ops/s
JmhSqlBenchmark.querySimpleBatchIndexed         H2  thrpt    3  2078,101 ±  
657,757  ops/s
JmhSqlBenchmark.querySimpleBatchIndexed    CALCITE  thrpt    3  1431,797 ±  
865,767  ops/s
JmhSqlBenchmark.querySimpleUnique               H2  thrpt    3    78,912 ±   
23,052  ops/s
JmhSqlBenchmark.querySimpleUnique          CALCITE  thrpt    3    60,679 ±   
27,464  ops/s
JmhSqlBenchmark.querySimpleUniqueIndexed        H2  thrpt    3  8505,926 ± 
3431,167  ops/s
JmhSqlBenchmark.querySimpleUniqueIndexed   CALCITE  thrpt    3  4988,372 ±  
684,324  ops/s{noformat}
For simple queries (\{{querySimpleUniqueIndexed}}) average latency is reduced 
from about 800 microseconds to about 200 microseconds.

 

Changes in messages workflow:

 
||From||To||Message||What's changed||
|Initiator|Data node|QueryStartRequest| |
|Data node|Initiator|QueryStartResponse|Don't send if there was a batch to the 
initiator node sent before.|
|Data node|Initiator|QueryBatchMessage| |
|Initiator|Data node|QueryBatchAcknowledgeMessage|Don't send for the last batch|
|Initiator|Data node|QueryCloseMessage|Don't send if the last batch for each 
fragment was received|
|Data node|Initiator|ErrorMessage|Sent as a reply for QueryCloseMessage, if 
there were no QueryCloseMessage's, don't send this message too.|

So, the minimum required messages between nodes for the query were reduced from 
6 to 2.

 

> Calcite engine. Query start request contains a lot of data
> ----------------------------------------------------------
>
>                 Key: IGNITE-16315
>                 URL: https://issues.apache.org/jira/browse/IGNITE-16315
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Aleksey Plekhanov
>            Assignee: Aleksey Plekhanov
>            Priority: Major
>              Labels: calcite2-required, calcite3-required
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> For simple queries SQL engine most of the time spend in writing/reading query 
> start requests, which contains a lot of data. Nested instances of 
> {{ColocationGroup}} class contain assignments for each partition 
> ({{{}List<List<UUID>>{}}}). Transferred size can be reduced if we compact 
> assignments somehow. The target colocation group from fragment description 
> contains redundant synthetic partitions, this also can be optimized.
> Messages workflow is not optimal too. First, we send {{QueryStartRequest}} to 
> the remote nodes, remotes reply with the QueryStartResponse messages. After 
> that remotes send batches with data to the target nodes and receive acks for 
> each batch (acks required to limit inbox workload). When query execution is 
> finished, the node initiator sends {{QueryCloseMessage}} to the remote nodes, 
> remotes close queries, and sends back {{ErrorMessage}} to the initiator with 
> the {{ExecutionCancelledException}} error (which is ignored on the initiator 
> node).  
> Also, some other optimizations are possible. Proposed changes:
>  * Implement compaction of assignments of {{ColocationGroup}}
>  * Reduce target colocation group partitions count
>  * Fix caching of query plans (store original SQL as key, not parsed SQL, to 
> avoid redundant parsing)
>  * Change messages workflow (don't send ack messages for the last batch since 
> it is redundant, self-close remote queries, and don't send close query 
> messages to remote nodes, if we know for sure that it's already self-closed, 
> don't send query start response if we already have sent batch for the same 
> fragment before)
>  * Reduce count of {{RexBuilder}} creation on the execution phase (RexBuilder 
> is stateless and can be used one static instance)
>  * Reduce count of Calcite types creation on the execution phase



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (IGNITE-16315) Calcite engine. Query start request contains a lot of data

Reply via email to