kangkaisen opened a new issue #831: Powerful Improve: parallel scan exec instance URL: https://github.com/apache/incubator-doris/issues/831 Currently, we always allocate one scan instance exec node for one fragment in one BE, even if there are a lot of tablets and the cluster resource is abundant. So, I think we could parallel scan exec instance ,take full advantage of cluster resource and speed the query. Specifically speaking,we could introduce a new config `parallel_scan_exec_instance_num`. If the `parallel_scan_exec_instance_num` is 10 and there are 100 tablets one fragment in one BE, we could allocate 100 / 10 = 10 scan instance exec node for one fragment in one BE. The following is the test result for this improve: All query are from our prod env and the config `parallel_scan_exec_instance_num` is 5. **I test the following SQL use same BE cluster and different FE(whether has this improve)** SQL 1: ``` select count(*), datekey from table group by datekey order by datekey; ``` SQL 2: ``` SELECT sum(account_dr) AS account_dr, SUM(account_cr) AS account_cr FROM table WHERE deliver_date >= '2019-03-01' AND deliver_date <= '2019-03-30'; ``` SQL 3: ``` select count(*) from ( select crawl_time_hour0, room_status_result, row_number() over(partition by crawl_time_hour0, mt_room_id, mt_breakfast order by mt_room_status asc, comp_room_status asc) rn from table where datekey <= 20190327 and datekey >= 20190325 and crawl_time_hour0 >= 11230 and crawl_time_hour0 <= 12000) t; ``` SQL 4: ``` select count(*) FROM t1 INNER JOIN [shuffle] t5 ON ((t1.dt = t5.dt) AND (t1.wm_poi_id = t5.wm_poi_id)) INNER JOIN [shuffle] t6 ON ((t1.dt = t6.dt) AND (t1.wm_poi_id = t6.wm_poi_id)) where t1.dt <= 20190327 and t1.dt >= 20190201; ``` SQL 5: ``` select count(*) from ( select crawl_time_hour0, room_status_result, row_number() over(partition by crawl_time_hour0, mt_room_id, mt_breakfast order by mt_room_status asc, comp_room_status asc) rn from table where datekey <= 20190328 and datekey >= 20190320) t; ``` The following picture illustrate the test result: 
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
