[
https://issues.apache.org/jira/browse/KYLIN-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
hujiahua updated KYLIN-5128:
----------------------------
Description:
I often encounter cube building job running for a long time in the global dict
resizing process stage. After spark stage analysis, I found that it was caused
by too little concurrency of the task.
!image-2021-11-17-10-03-26-943.png!
And I also found kylin using sparkSession.createDataset to build dict bucket
dataset, which mean the parallelize size was `sparkContext.defaultParallelism`.
When enable spark executor dynamic allocation (set
spark.dynamicAllocation.enabled = true) ,sparkContext.defaultParallelism will
change during runtime, and have a chance to get a small parallelism value.
!image-2021-11-17-10-12-46-187.png!
was:
I often encounter cube building job running for a long time in the global dict
resizing process stage. After spark stage analysis, I found that it was caused
by too little concurrency of the task.
!image-2021-11-17-10-03-26-943.png!
And I also found kylin using sparkSession.createDataset to build dict bucket
dataset, where mean the parallelize size was `sparkContext.defaultParallelism`.
When enable spark executor dynamic allocation (spark.dynamicAllocation.enabled)
,sparkContext.defaultParallelism will change during runtime, and have a chance
to get a small parallelism value.
!image-2021-11-17-10-12-46-187.png!
> The job of resizing global dict bucket sometimes run for a long time
> --------------------------------------------------------------------
>
> Key: KYLIN-5128
> URL: https://issues.apache.org/jira/browse/KYLIN-5128
> Project: Kylin
> Issue Type: Improvement
> Affects Versions: v4.0.0
> Reporter: hujiahua
> Priority: Major
> Attachments: image-2021-11-17-10-03-26-943.png,
> image-2021-11-17-10-12-46-187.png
>
>
> I often encounter cube building job running for a long time in the global
> dict resizing process stage. After spark stage analysis, I found that it was
> caused by too little concurrency of the task.
> !image-2021-11-17-10-03-26-943.png!
> And I also found kylin using sparkSession.createDataset to build dict bucket
> dataset, which mean the parallelize size was
> `sparkContext.defaultParallelism`. When enable spark executor dynamic
> allocation (set spark.dynamicAllocation.enabled = true)
> ,sparkContext.defaultParallelism will change during runtime, and have a
> chance to get a small parallelism value.
> !image-2021-11-17-10-12-46-187.png!
--
This message was sent by Atlassian Jira
(v8.20.1#820001)