kangkaisen opened a new issue #831: Powerful Improve: parallel scan exec 
instance
URL: https://github.com/apache/incubator-doris/issues/831
 
 
   Currently, we always allocate one scan instance exec node for one fragment 
in one BE, even if there are a lot of tablets and the cluster resource is 
abundant.
   
   So, I think we could parallel scan exec instance ,take full advantage of 
cluster resource and speed the query.
   
   Specifically speaking,we could introduce a new config 
`parallel_scan_exec_instance_num`.
   If the `parallel_scan_exec_instance_num` is 10 and there are 100 tablets one 
fragment in one BE, we could allocate 100 / 10 = 10 scan instance exec node for 
one fragment in one BE.
   
   
   The following is the test result for this improve:
   All query are from our prod env and the config 
`parallel_scan_exec_instance_num` is 5.
   **I test the following SQL use same BE cluster and different FE(whether has 
this improve)**
   
   SQL 1:
   ```
   select count(*), datekey from table group by datekey order by datekey;
   ```
   SQL 2:
   ```
   SELECT  sum(account_dr) AS account_dr, SUM(account_cr) AS account_cr
   FROM table
   WHERE  deliver_date >= '2019-03-01' AND deliver_date <= '2019-03-30';
   ```
   
   SQL 3:
   ```
   select count(*) from (
   select crawl_time_hour0, room_status_result, row_number()
      over(partition by crawl_time_hour0, mt_room_id, mt_breakfast order by 
mt_room_status asc, comp_room_status asc) rn
   from table
   where  datekey <= 20190327 and  datekey >= 20190325 and crawl_time_hour0 >= 
11230 and crawl_time_hour0 <= 12000) t;
   ```
   SQL 4:
   ```
   select count(*)  
   FROM  t1
   INNER JOIN [shuffle] t5
      ON ((t1.dt = t5.dt) AND (t1.wm_poi_id = t5.wm_poi_id))
   INNER JOIN [shuffle]  t6
      ON ((t1.dt = t6.dt) AND (t1.wm_poi_id = t6.wm_poi_id))
   where t1.dt <= 20190327 and t1.dt >= 20190201;
   ```
   
   SQL 5:
   ```
   select count(*) from (
   select crawl_time_hour0, room_status_result, row_number()
      over(partition by crawl_time_hour0, mt_room_id, mt_breakfast order by 
mt_room_status asc, comp_room_status asc) rn
   from table
   where  datekey <= 20190328 and  datekey >= 20190320) t;
   ```
   
   The following picture illustrate the test result:
   
   
![image](https://user-images.githubusercontent.com/9894906/55157763-2fc79a80-5198-11e9-9e56-73d31fdb98af.png)
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to