[jira] [Commented] (KYLIN-4829) Support to use thread-level SparkSession to execute query

ASF GitHub Bot (Jira) Mon, 07 Dec 2020 22:59:11 -0800


    [ 
https://issues.apache.org/jira/browse/KYLIN-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245693#comment-17245693
 ]


ASF GitHub Bot commented on KYLIN-4829:
---------------------------------------

zzcclp commented on pull request #1495:
URL: https://github.com/apache/kylin/pull/1495#issuecomment-740415145


   ## Performance Testing
   
   ### Test Env
   
   - Hadoop 2.7.0 on docker.
   - Commit : 
[ed0649b1](https://github.com/apache/kylin/pull/1495/commits/ed0649b140529bdfafea8cce846962b6ca9c3f73)
   - Sparder Env : 
      spark.executor.cores=1
      spark.executor.instances=6
      spark.executor.memory=2G
      spark.executor.memoryOverhead=1G
      spark.sql.shuffle.partitions=6
   
   ### Test
   Sends 5 SQLs at the same time
   
   ### Before this patch
   The shuffle partition number of all querys is 6, which equals to the total 
cores number, and totally spent 3.09s to finish these 5 SQLs.
   
   5 SQLs submitted at the same time.
   
![image](https://user-images.githubusercontent.com/9430290/101448513-1ceffc00-3962-11eb-8219-17e3f020e8ab.png)
   
   
![image](https://user-images.githubusercontent.com/9430290/101448485-12356700-3962-11eb-8867-6a92bba33994.png)
   
   SQL1, there are 6 partitions at the second stage, even though there are only 
216KB data to read.
   
![image](https://user-images.githubusercontent.com/9430290/101448990-ef578280-3962-11eb-8bc1-c034406e338d.png)
   
   
   SQL2, there are still 6 partitions at the second stage, even though there 
are only 262KB data to read.
   
![image](https://user-images.githubusercontent.com/9430290/101449002-f41c3680-3962-11eb-8c07-e45e46ec2429.png)
   
   
   ### After this patch
   The shuffle partition number of each query is calculated according to the 
scanned bytes of each query, and totally spent 2.55s to finish these 5 SQLs.
   
   5 SQLs submitted at the same time.
   
![image](https://user-images.githubusercontent.com/9430290/101449120-2b8ae300-3963-11eb-89a3-582196929c32.png)
   
   
![image](https://user-images.githubusercontent.com/9430290/101449131-2e85d380-3963-11eb-85e7-40a4dafe77c7.png)
   
   
   SQL1, there is only 1 partition at the second stage.
   
![image](https://user-images.githubusercontent.com/9430290/101449188-4b220b80-3963-11eb-8231-c64e9936b2c8.png)
   
   
   SQL2, there is only 1 partition at the second stage.
   
![image](https://user-images.githubusercontent.com/9430290/101449270-6a209d80-3963-11eb-92df-2b2715c819ae.png)
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


> Support to use thread-level SparkSession to execute query 
> ----------------------------------------------------------
>
>                 Key: KYLIN-4829
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4829
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Query Engine, Spark Engine
>            Reporter: Zhichao  Zhang
>            Assignee: Zhichao  Zhang
>            Priority: Minor
>             Fix For: v4.0.0-beta
>
>
> Currently, when executing a query, it is impossible to configure proper 
> parameters for each query according to the data will be scanned, such as 
> spark.sql.shuffle.partitions, this will impact the performance of querying.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KYLIN-4829) Support to use thread-level SparkSession to execute query

Reply via email to