cherrylzhao opened a new issue #10150:
URL: https://github.com/apache/shardingsphere/issues/10150
### current logic
think we have a t_order table, 3 records distributed in 3 datasources as
follow:
```
+--------------------+---------+--------+
| order_id | user_id | status |
+--------------------+---------+--------+
| 591613421079212032 | jerry | init | -> ds0
| 591652652161937408 | jerry | init | -> ds1
| 591652696403456001 | jerry | init | -> ds2
+--------------------+---------+--------+
```
query sql is:
```
select user_id, count(user_id) from t_order group by user_id;
```
inside the shardingsphere, modified sql will be sent to backend datasources
to execute as follow:
```
+-----------------+-------------------------------------------------------------------------------------+
| datasource_name | sql
|
+-----------------+-------------------------------------------------------------------------------------+
| ds_0 | select user_id,count(user_id) from t_order group by
user_id ORDER BY user_id ASC |
| ds_1 | select user_id,count(user_id) from t_order group by
user_id ORDER BY user_id ASC |
| ds_2 | select user_id,count(user_id) from t_order group by
user_id ORDER BY user_id ASC |
+-----------------+-------------------------------------------------------------------------------------+
```
then 3 query results will be loaded into GroupByStreamMergeResult,
after MergeResult.next() was invoked core data flow is like this:
```
+-------------------------+
| currentRow |
+-------------------------+
[jerry, 1] -> count:1
[jerry, 1, jerry, 1] -> count:2
[jerry, 1, jerry, 1, jerry, 1] -> count:3
[jerry, 3, jerry, 1, jerry, 1] -> write count to currentRow
```
### optimize point
key point is that, count value was computed by `AggravationUnit` in
iteration process,
then update the currentRow value according to parsed aggregate projection
index after `group by` value changed.
so we should only cache first row of same group in currentRow to **reduce
memory used** like this:
```
+-------------------------+
| currentRow |
+-------------------------+
[jerry, 1] -> count:1
[jerry, 1] -> count:2
[jerry, 1] -> count:3
[jerry, 3] -> write count to currentRow
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]