[GitHub] [kylin] zzcclp edited a comment on pull request #1495: KYLIN-4829 Support to use thread-level SparkSession to execute query


zzcclp edited a comment on pull request #1495:
URL: https://github.com/apache/kylin/pull/1495#issuecomment-738628576



   ## The Results of Testing Manually
   
   ### Test Env
   
   - Hadoop 2.7.0 on docker.
   - Commit : 
[3b3786c5c](https://github.com/apache/kylin/commit/3b3786c5c9602838cd4abd0a6d40574550ec8622)
   - Sparder Env : 
      spark.executor.cores=1
      spark.executor.instances=4
      spark.executor.memory=2G
      spark.executor.memoryOverhead=1G
      spark.sql.shuffle.partitions=4
   
   
   ### Before this patch
   The shuffle partition number of all querys is 4, which equals to the total 
cores number.
   
   
![image](https://user-images.githubusercontent.com/9430290/101136306-19016880-3648-11eb-8ae0-2e02d42a41ac.png)
   
   
![image](https://user-images.githubusercontent.com/9430290/101136373-32a2b000-3648-11eb-83dd-83b52e2d9980.png)
   
   
![image](https://user-images.githubusercontent.com/9430290/101136443-4e0dbb00-3648-11eb-8d31-ac721797ee94.png)
   
   
![image](https://user-images.githubusercontent.com/9430290/101136476-5a921380-3648-11eb-90f7-ec20faeca57b.png)
   
   
   ### After this patch
   The shuffle partition number of each query is calculated according to the 
scanned bytes of each query:
   
   
![image](https://user-images.githubusercontent.com/9430290/101136174-e3f51600-3647-11eb-99c9-290831bb30af.png)
   The log messages shown as below:
   `2020-12-04 05:28:11,991 INFO  [Query 
a5e841ba-c430-383b-dfa8-5694cd6d282b-122] datasource.FilePruner:51 : Set 
partition to 2, total bytes 92610534`
   
   
![image](https://user-images.githubusercontent.com/9430290/101136210-f40cf580-3647-11eb-8eec-0b7bdef93c30.png)
   The log messages shown as below:
   `2020-12-04 05:28:12,112 INFO  [Query 
534a7afb-4857-6e0c-67b8-bd6a8da155a8-130] datasource.FilePruner:51 : Set 
partition to 1, total bytes 42133710`
   
   
![image](https://user-images.githubusercontent.com/9430290/101136227-fa02d680-3647-11eb-9001-128c0e3bd490.png)
   The log messages shown as below:
   `2020-12-04 05:28:12,141 INFO  [Query 
744d34d1-0d06-11c8-fdee-1f260388117f-131] datasource.FilePruner:51 : Set 
partition to 3, total bytes 158775868`
   
   
![image](https://user-images.githubusercontent.com/9430290/101136249-05ee9880-3648-11eb-9267-c1cb44319697.png)
   The log messages shown as below:
   `2020-12-04 08:16:43,746 INFO  [Query 
e117ceb8-53c1-959e-9cb0-75ee3901e271-126] pushdown.SparkSqlClient:68 : Auto set 
spark.sql.shuffle.partitions to 8, the total sources size is 415631445 b`
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [kylin] zzcclp edited a comment on pull request #1495: KYLIN-4829 Support to use thread-level SparkSession to execute query

Reply via email to