morningman commented on pull request #7515:
URL: https://github.com/apache/incubator-doris/pull/7515#issuecomment-1013833341


   Hi, @GoGoWen , Sorry for the late reply, I've rethought the question.
   
   Essentially, the purpose of the `_aggregation` parameter is to force the 
storage layer to return data according to the **aggregation semantics of the 
table itself** when the **aggregation semantics in the original SQL** do not 
match the **aggregation semantics of the table itself**.
   
   In the case of single rowset, we can use `_direct_agg_key_next_row` to 
return the data directly because we can confirm that the data has been 
aggregated according to the table's own aggregation semantics.
   
   The optimization you want to do is to use the aggregation function at the 
storage layer to reduce the amount of data returned from the storage layer in 
the single rowset case. But in essence, the amount of computation is the same 
when aggregation is done at the storage layer and aggregation is done at the 
query layer (there is no network overhead, it's all done locally within the 
same process).
   
   And the parameter `_aggregation` is used to determine whether to optimize or 
not. However, the `_aggregation` parameter is not essentially used for this 
purpose, which may make the code less readable.
   
   So have you come across a specific case and found that optimizing this way 
can significantly improve the efficiency of the query?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to