[
https://issues.apache.org/jira/browse/KYLIN-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
hujiahua updated KYLIN-5163:
----------------------------
Summary: Global dictionary build job may produce incomplete dictionary file
(was: Global dictionary build job may produced incomplete dictionary file)
> Global dictionary build job may produce incomplete dictionary file
> ------------------------------------------------------------------
>
> Key: KYLIN-5163
> URL: https://issues.apache.org/jira/browse/KYLIN-5163
> Project: Kylin
> Issue Type: Bug
> Components: Job Engine
> Affects Versions: v4.0.1
> Reporter: hujiahua
> Priority: Major
>
> The current dictionary spark build job uses function
> `NBucketDictionary.saveBucketDict` to write dictionary files (include CURR
> file and PREV file) for each partition. But it does not consider that there
> may be concurrency multiple tasks for one same partition, such as scenarios
> like task retry or speculation task. Concurrency multiple tasks of one
> partition may cause incomplete dictionary file and we've encountered this
> issue in production.
> I describe the issue in terms of timeline:
> 1. currently in the dictionary building phase, one executor called E1 was
> preparing to build dictionary file for partition 0
> 2. driver sent E1 shutdown message because of YARN resource preemption. Then
> driver mark the task of partition 0 failed and created a retry task to
> another executor called E2.
> 3. E2 began to proccess task, and finished task in a short time.
> 4. after E2 finished task, E1 began to proccess task, so E1 delete complete
> dictionary file which was created by E2 and created new dictionary file to
> write.
> 5. Then E1 received driver's shutdown message and kill himself, finally left
> a incomplete dictionary file which was not finished.
> 6. after other partition finished, the stage was marked successfull.
> 7. when next phase table encoding using incomplete dictionary file, stage
> will failed.
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)