[ 
https://issues.apache.org/jira/browse/KYLIN-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17445576#comment-17445576
 ] 

ASF GitHub Bot commented on KYLIN-5128:
---------------------------------------

zhengshengjun commented on pull request #1772:
URL: https://github.com/apache/kylin/pull/1772#issuecomment-972431908


   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


> The job of resizing global dict bucket sometimes run for a long time
> --------------------------------------------------------------------
>
>                 Key: KYLIN-5128
>                 URL: https://issues.apache.org/jira/browse/KYLIN-5128
>             Project: Kylin
>          Issue Type: Improvement
>    Affects Versions: v4.0.0
>            Reporter: hujiahua
>            Priority: Major
>         Attachments: image-2021-11-17-10-03-26-943.png, 
> image-2021-11-17-10-12-46-187.png
>
>
> I often encounter cube building job running for a long time in the global 
> dict resizing process stage. After spark stage analysis, I found that it was 
> caused by too little concurrency of the task.
>  !image-2021-11-17-10-03-26-943.png! 
> And I also found kylin using sparkSession.createDataset to build dict bucket 
> dataset, which mean the parallelize size was 
> `sparkContext.defaultParallelism`. When enable spark executor dynamic 
> allocation (set spark.dynamicAllocation.enabled = true) 
> ,sparkContext.defaultParallelism will change during runtime, and have a 
> chance to get a small parallelism value.
>  !image-2021-11-17-10-12-46-187.png! 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to