[ 
https://issues.apache.org/jira/browse/KYLIN-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243817#comment-17243817
 ] 

ASF GitHub Bot commented on KYLIN-4829:
---------------------------------------

zzcclp edited a comment on pull request #1495:
URL: https://github.com/apache/kylin/pull/1495#issuecomment-738628576


   ## The Results of Testing Manually
   
   ### Test Env
   
   - Hadoop 2.7.0 on docker.
   - Commit : 
[3b3786c5c](https://github.com/apache/kylin/commit/3b3786c5c9602838cd4abd0a6d40574550ec8622)
   - Sparder Env : 
      spark.executor.cores=1
      spark.executor.instances=4
      spark.executor.memory=2G
      spark.executor.memoryOverhead=1G
      spark.sql.shuffle.partitions=4
   
   
   ### Before this patch
   The shuffle partition number of all querys is 4, which equals to the total 
cores number.
   
   
![image](https://user-images.githubusercontent.com/9430290/101136306-19016880-3648-11eb-8ae0-2e02d42a41ac.png)
   
   
![image](https://user-images.githubusercontent.com/9430290/101136373-32a2b000-3648-11eb-83dd-83b52e2d9980.png)
   
   
![image](https://user-images.githubusercontent.com/9430290/101136443-4e0dbb00-3648-11eb-8d31-ac721797ee94.png)
   
   
![image](https://user-images.githubusercontent.com/9430290/101136476-5a921380-3648-11eb-90f7-ec20faeca57b.png)
   
   
   ### After this patch
   The shuffle partition number of each query is calculated according to the 
scanned bytes of each query:
   
   
![image](https://user-images.githubusercontent.com/9430290/101136174-e3f51600-3647-11eb-99c9-290831bb30af.png)
   The log messages shown as below:
   `2020-12-04 05:28:11,991 INFO  [Query 
a5e841ba-c430-383b-dfa8-5694cd6d282b-122] datasource.FilePruner:51 : Set 
partition to 2, total bytes 92610534`
   
   
![image](https://user-images.githubusercontent.com/9430290/101136210-f40cf580-3647-11eb-8eec-0b7bdef93c30.png)
   The log messages shown as below:
   `2020-12-04 05:28:12,112 INFO  [Query 
534a7afb-4857-6e0c-67b8-bd6a8da155a8-130] datasource.FilePruner:51 : Set 
partition to 1, total bytes 42133710`
   
   
![image](https://user-images.githubusercontent.com/9430290/101136227-fa02d680-3647-11eb-9001-128c0e3bd490.png)
   The log messages shown as below:
   `2020-12-04 05:28:12,141 INFO  [Query 
744d34d1-0d06-11c8-fdee-1f260388117f-131] datasource.FilePruner:51 : Set 
partition to 3, total bytes 158775868`
   
   
![image](https://user-images.githubusercontent.com/9430290/101136249-05ee9880-3648-11eb-9267-c1cb44319697.png)
   The log messages shown as below:
   `2020-12-04 08:16:43,746 INFO  [Query 
e117ceb8-53c1-959e-9cb0-75ee3901e271-126] pushdown.SparkSqlClient:68 : Auto set 
spark.sql.shuffle.partitions to 8, the total sources size is 415631445 b`
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


> Support to use thread-level SparkSession to execute query 
> ----------------------------------------------------------
>
>                 Key: KYLIN-4829
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4829
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Query Engine, Spark Engine
>            Reporter: Zhichao  Zhang
>            Assignee: Zhichao  Zhang
>            Priority: Minor
>             Fix For: v4.0.0-beta
>
>
> Currently, when executing a query, it is impossible to configure proper 
> parameters for each query according to the data will be scanned, such as 
> spark.sql.shuffle.partitions, this will impact the performance of querying.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to