[
https://issues.apache.org/jira/browse/IGNITE-16315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480829#comment-17480829
]
Aleksey Plekhanov commented on IGNITE-16315:
--------------------------------------------
Benchmarking results on my laptop (1 client node, 3 server nodes, 1 thread).
Before optimizations:
{noformat}
Benchmark (engine) Mode Cnt Score
Error Units JmhSqlBenchmark.queryGroupBy H2 thrpt 3
46,495 ± 25,221 ops/s JmhSqlBenchmark.queryGroupBy CALCITE
thrpt 3 50,354 ± 6,799 ops/s JmhSqlBenchmark.queryGroupByIndexed
H2 thrpt 3 49,043 ± 13,774 ops/s
JmhSqlBenchmark.queryGroupByIndexed CALCITE thrpt 3 65,335 ±
21,285 ops/s JmhSqlBenchmark.queryOrderByBatch H2 thrpt 3
1238,830 ± 521,193 ops/s JmhSqlBenchmark.queryOrderByBatch CALCITE
thrpt 3 706,558 ± 317,412 ops/s JmhSqlBenchmark.queryOrderByFull
H2 thrpt 3 19,009 ± 2,478 ops/s
JmhSqlBenchmark.queryOrderByFull CALCITE thrpt 3 15,380 ±
5,543 ops/s JmhSqlBenchmark.querySimpleBatch H2 thrpt 3
79,860 ± 16,487 ops/s JmhSqlBenchmark.querySimpleBatch CALCITE
thrpt 3 55,177 ± 15,951 ops/s JmhSqlBenchmark.querySimpleBatchIndexed
H2 thrpt 3 1973,364 ± 325,749 ops/s
JmhSqlBenchmark.querySimpleBatchIndexed CALCITE thrpt 3 788,484 ±
274,067 ops/s JmhSqlBenchmark.querySimpleUnique H2 thrpt 3
77,317 ± 27,542 ops/s JmhSqlBenchmark.querySimpleUnique CALCITE
thrpt 3 58,770 ± 11,196 ops/s JmhSqlBenchmark.querySimpleUniqueIndexed
H2 thrpt 3 8462,273 ± 3255,088 ops/s
JmhSqlBenchmark.querySimpleUniqueIndexed CALCITE thrpt 3 1270,796 ±
1179,145 ops/s
{noformat}
After optimizations:
{noformat}
Benchmark (engine) Mode Cnt Score
Error Units
JmhSqlBenchmark.queryGroupBy H2 thrpt 3 50,589 ±
7,986 ops/s
JmhSqlBenchmark.queryGroupBy CALCITE thrpt 3 52,923 ±
8,995 ops/s
JmhSqlBenchmark.queryGroupByIndexed H2 thrpt 3 51,276 ±
19,417 ops/s
JmhSqlBenchmark.queryGroupByIndexed CALCITE thrpt 3 71,974 ±
90,986 ops/s
JmhSqlBenchmark.queryOrderByBatch H2 thrpt 3 1334,399 ±
477,006 ops/s
JmhSqlBenchmark.queryOrderByBatch CALCITE thrpt 3 1276,147 ±
436,042 ops/s
JmhSqlBenchmark.queryOrderByFull H2 thrpt 3 17,768 ±
3,904 ops/s
JmhSqlBenchmark.queryOrderByFull CALCITE thrpt 3 15,680 ±
3,744 ops/s
JmhSqlBenchmark.querySimpleBatch H2 thrpt 3 78,919 ±
24,512 ops/s
JmhSqlBenchmark.querySimpleBatch CALCITE thrpt 3 57,916 ±
15,917 ops/s
JmhSqlBenchmark.querySimpleBatchIndexed H2 thrpt 3 2078,101 ±
657,757 ops/s
JmhSqlBenchmark.querySimpleBatchIndexed CALCITE thrpt 3 1431,797 ±
865,767 ops/s
JmhSqlBenchmark.querySimpleUnique H2 thrpt 3 78,912 ±
23,052 ops/s
JmhSqlBenchmark.querySimpleUnique CALCITE thrpt 3 60,679 ±
27,464 ops/s
JmhSqlBenchmark.querySimpleUniqueIndexed H2 thrpt 3 8505,926 ±
3431,167 ops/s
JmhSqlBenchmark.querySimpleUniqueIndexed CALCITE thrpt 3 4988,372 ±
684,324 ops/s{noformat}
For simple queries (\{{querySimpleUniqueIndexed}}) average latency is reduced
from about 800 microseconds to about 200 microseconds.
Changes in messages workflow:
||From||To||Message||What's changed||
|Initiator|Data node|QueryStartRequest| |
|Data node|Initiator|QueryStartResponse|Don't send if there was a batch to the
initiator node sent before.|
|Data node|Initiator|QueryBatchMessage| |
|Initiator|Data node|QueryBatchAcknowledgeMessage|Don't send for the last batch|
|Initiator|Data node|QueryCloseMessage|Don't send if the last batch for each
fragment was received|
|Data node|Initiator|ErrorMessage|Sent as a reply for QueryCloseMessage, if
there were no QueryCloseMessage's, don't send this message too.|
So, the minimum required messages between nodes for the query were reduced from
6 to 2.
> Calcite engine. Query start request contains a lot of data
> ----------------------------------------------------------
>
> Key: IGNITE-16315
> URL: https://issues.apache.org/jira/browse/IGNITE-16315
> Project: Ignite
> Issue Type: Improvement
> Reporter: Aleksey Plekhanov
> Assignee: Aleksey Plekhanov
> Priority: Major
> Labels: calcite2-required, calcite3-required
> Time Spent: 10m
> Remaining Estimate: 0h
>
> For simple queries SQL engine most of the time spend in writing/reading query
> start requests, which contains a lot of data. Nested instances of
> {{ColocationGroup}} class contain assignments for each partition
> ({{{}List<List<UUID>>{}}}). Transferred size can be reduced if we compact
> assignments somehow. The target colocation group from fragment description
> contains redundant synthetic partitions, this also can be optimized.
> Messages workflow is not optimal too. First, we send {{QueryStartRequest}} to
> the remote nodes, remotes reply with the QueryStartResponse messages. After
> that remotes send batches with data to the target nodes and receive acks for
> each batch (acks required to limit inbox workload). When query execution is
> finished, the node initiator sends {{QueryCloseMessage}} to the remote nodes,
> remotes close queries, and sends back {{ErrorMessage}} to the initiator with
> the {{ExecutionCancelledException}} error (which is ignored on the initiator
> node).
> Also, some other optimizations are possible. Proposed changes:
> * Implement compaction of assignments of {{ColocationGroup}}
> * Reduce target colocation group partitions count
> * Fix caching of query plans (store original SQL as key, not parsed SQL, to
> avoid redundant parsing)
> * Change messages workflow (don't send ack messages for the last batch since
> it is redundant, self-close remote queries, and don't send close query
> messages to remote nodes, if we know for sure that it's already self-closed,
> don't send query start response if we already have sent batch for the same
> fragment before)
> * Reduce count of {{RexBuilder}} creation on the execution phase (RexBuilder
> is stateless and can be used one static instance)
> * Reduce count of Calcite types creation on the execution phase
--
This message was sent by Atlassian Jira
(v8.20.1#820001)