退订
| | 大数据开发工程师-付德彬 | | [email protected] | 签名由网易邮箱大师定制 On 11/17/2021 10:27,hujiahua (Jira)<[email protected]> wrote: hujiahua created KYLIN-5128: ------------------------------- Summary: The job of resizing global dict bucket sometimes run for a long time Key: KYLIN-5128 URL: https://issues.apache.org/jira/browse/KYLIN-5128 Project: Kylin Issue Type: Improvement Affects Versions: v4.0.0 Reporter: hujiahua Attachments: image-2021-11-17-10-03-26-943.png, image-2021-11-17-10-12-46-187.png I often encounter cube building job running for a long time in the global dict resizing process stage. After spark stage analysis, I found that it was caused by too little concurrency of the task. !image-2021-11-17-10-03-26-943.png! And I also found kylin using sparkSession.createDataset to build dict bucket dataset, where mean the parallelize size was `sparkContext.defaultParallelism`. When enable spark executor dynamic allocation (spark.dynamicAllocation.enabled) ,sparkContext.defaultParallelism will change during runtime, and have a chance to get a small parallelism value. !image-2021-11-17-10-12-46-187.png! -- This message was sent by Atlassian Jira (v8.20.1#820001)
