[GitHub] [incubator-doris] Userwhite opened a new issue #6066: [Feature] 针对聚合操作的查询计划优化

GitBox Sun, 20 Jun 2021 04:03:12 -0700


Userwhite opened a new issue #6066:
URL: https://github.com/apache/incubator-doris/issues/6066



   相关链接：https://github.com/apache/incubator-doris/issues/4481
   
   ## 现有优化
   对于Aggregate操作查询计划：默认是单节点做完Aggregate，然后shuffle进行merge finalize。
   
   社区对单表的聚合进行了优化：Group 
by字段包含分桶列(都要满足)，分桶列包含分区列(只有多分区需要满足)，如果满足上述条件，说明单节点做完聚合已经是最终的结果集，不需要merge 
finalize。
   
   ## 期望优化
   1、单表scan上的聚合：
   满足条件：
   * 无分区：Group by字段包含分桶列
   * 有分区：Group by字段包含分桶列；**Group by字段包含分区列**(扩宽了条件，显然满足这个的时候，同一组不会在不同分区)
   * **有分区但数据只命中一个分区的时候和无分区的条件一致**。
   
   2、多表join上的聚合：
   多表join下聚合优化的判断可以简化为对各个olap_scan_node的判断。
   
   * colocate join
   唯一能在一个fragment出现多个olap_scan_node的情况。
   只要多张表中有一张满足单表的优化条件(需要把Group by字段分配到各个表上)，就能够去除merge finalize。
   同时，由于colocate join的表存在bucket sequence，所以不需要考虑多分区(多分区可以视作单分区)。
   
   * 其他join
   至多出现一个olap_scan_node。
   不考虑其他shuffle过来的，不在此fragment的表的Group by。
   只需要考虑该olap_scan_node是否满足即可。
   
   * 嵌套join
   无论如何嵌套：不会出现colocate join的表和其他join的表都在一个fragment的情况。
   所以要么是考虑colocate join的多表，要么是考虑其他join的单表，和上两种情况一致。
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [incubator-doris] Userwhite opened a new issue #6066: [Feature] 针对聚合操作的查询计划优化

Reply via email to