退订

| |
大数据开发工程师-付德彬
|
|
[email protected]
|
签名由网易邮箱大师定制


On 11/17/2021 10:27,hujiahua (Jira)<[email protected]> wrote:
hujiahua created KYLIN-5128:
-------------------------------

Summary: The job of resizing global dict bucket sometimes run for a long time
Key: KYLIN-5128
URL: https://issues.apache.org/jira/browse/KYLIN-5128
Project: Kylin
Issue Type: Improvement
Affects Versions: v4.0.0
Reporter: hujiahua
Attachments: image-2021-11-17-10-03-26-943.png, 
image-2021-11-17-10-12-46-187.png

I often encounter cube building job running for a long time in the global dict 
resizing process stage. After spark stage analysis, I found that it was caused 
by too little concurrency of the task.
!image-2021-11-17-10-03-26-943.png!

And I also found kylin using sparkSession.createDataset to build dict bucket 
dataset, where mean the parallelize size was `sparkContext.defaultParallelism`. 
When enable spark executor dynamic allocation (spark.dynamicAllocation.enabled) 
,sparkContext.defaultParallelism will change during runtime, and have a chance 
to get a small parallelism value.
!image-2021-11-17-10-12-46-187.png!





--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to