[
https://issues.apache.org/jira/browse/KYLIN-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17498556#comment-17498556
]
ASF GitHub Bot commented on KYLIN-5163:
---------------------------------------
sleep1661 opened a new pull request #1822:
URL: https://github.com/apache/kylin/pull/1822
## Proposed changes
The current global dictionary spark building job may produce incomplete
dictionary file. So i create a spark datasource for global dictionary writing
for solving this bug.
## Github Branch
As most of the development works are on Kylin 4, we need to switch it as
main branch. Apache Kylin community changes the branch settings on Github since
2021-08-04 :
1. The original branch _kylin-on-parquet-v2_ for **Kylin 4.X** (Parquet
Storage) has been renamed to branch **main**, and configured as the **default**
branch;
2. The original branch _master_ for **Kylin 3.X** (HBase Storage) has been
renamed to branch **kylin3** ;
Please check [Intro to Kylin 4
architecture](https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/)
and [INFRA-22166](https://issues.apache.org/jira/browse/INFRA-22166) if you
are interested.
## Types of changes
What types of changes does your code introduce to Kylin?
_Put an `x` in the boxes that apply_
- [x] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- [ ] Documentation Update (if none of the other choices apply)
## Checklist
_Put an `x` in the boxes that apply. You can also fill these out after
creating the PR. If you're unsure about any of them, don't hesitate to ask.
We're here to help! This is simply a reminder of what we are going to look for
before merging your code._
- [x] I have create an issue on [Kylin's
jira](https://issues.apache.org/jira/browse/KYLIN), and have described the
bug/feature there in detail
- [x] Commit messages in my PR start with the related jira ID, like
"KYLIN-0000 Make Kylin project open-source"
- [x] Compiling and unit tests pass locally with my changes
- [ ] I have added tests that prove my fix is effective or that my feature
works
- [ ] I have added necessary documentation (if appropriate)
- [ ] Any dependent changes have been merged
## Further comments
If this is a relatively large or complex change, kick off the discussion at
user@kylin or dev@kylin by explaining why you chose the solution you did and
what alternatives you considered, etc...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
> Global dictionary build job may produce incomplete dictionary file
> ------------------------------------------------------------------
>
> Key: KYLIN-5163
> URL: https://issues.apache.org/jira/browse/KYLIN-5163
> Project: Kylin
> Issue Type: Bug
> Components: Job Engine
> Affects Versions: v4.0.1
> Reporter: hujiahua
> Priority: Major
>
> The current dictionary spark build job uses function
> `NBucketDictionary.saveBucketDict` to write dictionary files (include CURR
> file and PREV file) for each partition. But it does not consider that there
> may be concurrency multiple tasks for one same partition, such as scenarios
> like task retry or speculation task. Concurrency multiple tasks of one
> partition may cause incomplete dictionary file and we've encountered this
> issue in production.
> I describe the issue in terms of timeline:
> 1. currently in the dictionary building phase, one executor called E1 was
> preparing to build dictionary file for partition 0
> 2. driver sent E1 shutdown message because of YARN resource preemption. Then
> driver mark the task of partition 0 failed and created a retry task to
> another executor called E2.
> 3. E2 began to proccess task, and finished task in a short time.
> 4. after E2 finished task, E1 began to proccess task, so E1 delete complete
> dictionary file which was created by E2 and created new dictionary file to
> write.
> 5. Then E1 received driver's shutdown message and kill himself, finally left
> a incomplete dictionary file which was not finished.
> 6. after other partition finished, the stage was marked successfull.
> 7. when next phase table encoding using incomplete dictionary file, stage
> will failed.
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)