[
https://issues.apache.org/jira/browse/KYLIN-6057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17930194#comment-17930194
]
Guoliang Sun commented on KYLIN-6057:
-------------------------------------
h3. Dev Design
- Adjust the dictionary `build version` to be determined only after acquiring
the distributed lock.
- For a single build task involving multiple dictionaries, change from a
unified `build version` to independent `build versions`, maintained and passed
via `buildParam` to subsequent `encodeColumn` operations.
- After acquiring the distributed lock, the default `build version` remains as
`System.currentTimeMillis()`.
- Retrieve and compare the latest `version_xxx` directory suffix in the current
dictionary directory with the current `build version`. If the current `build
version` is less than or equal to the existing directory suffix, adjust the
current `build version` to `existing suffix + 1`.
- Example: Current `buildVersion` is `1725604515885`, and the dictionary
directory contains `version_1725604550000`.
- Since `1725604515885 < 1725604550000`,
- Adjust the current `buildVersion` to `1725604550001`.
- After dictionary construction but before flat table encoding, pause and
restart the build task. On the driver side, directly retrieve the actual latest
dictionary `buildVersion` instead of resetting it to `-1`, ensuring that Spark
executors always use a fixed `buildVersion` during execution.
- If the actual dictionary value cannot be retrieved on the driver side, throw
a `NoRetryException` to terminate the current build.
> Incorrect Data in Global Dictionary Construction
> ------------------------------------------------
>
> Key: KYLIN-6057
> URL: https://issues.apache.org/jira/browse/KYLIN-6057
> Project: Kylin
> Issue Type: Bug
> Affects Versions: 5.0.0
> Reporter: Guoliang Sun
> Assignee: Guoliang Sun
> Priority: Major
> Fix For: 5.0.2
>
>
> Kylin Query Count Distinct Results Incorrect
> h3. Root Cause
> During high-concurrency builds, slight differences in machine clocks may
> cause different build tasks to use the same old version of a field for
> construction, leading to overlapping values in the global dictionary.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)