[ 
https://issues.apache.org/jira/browse/KYLIN-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17835405#comment-17835405
 ] 

pengfei.zhan commented on KYLIN-5788:
-------------------------------------

h1. Root Cause


When multiple tasks concurrently build and use the same global dictionary, the 
flat table encode process does not ensure the unity of the dictionary version 
used. At the same time, another task expands the dictionary, resulting in part 
of the flat table partition mistakenly using the new version of the dictionary 
partition file. Due to the inconsistency of the data distribution, and thus not 
able to obtain the correct dictionary content, resulting in the flat table 
encode being listed as 0 and ultimately leading to count distinct value 
exception.

 

Examples:


1. Normal flat table encode process
Dictionary v1 will not be used, because v2 is the latest version of the 
dictionary, and the number of partitions in the flat table in the encode is the 
same as the number of buckets in the dictionary v2, and the flat table 
partition ids and dictionary bucket ids correspond to each other.

!pic1.jpeg|width=694,height=320!


2. Flat table encode process for exceptions
2-1. The beginning is normal
!pic2.jpeg|width=667,height=218!

2-2. After another build task has also completed dictionary construction and 
generated a new version of the dictionary

!pic3.jpeg|width=665,height=205!

There are 2 questions that arise here:
1) The flat table encode process does not use a unified version of the 
dictionary
2) Dictionary v3 has been expanded, and the data distribution has been changed, 
so even if the flat table partition still uses the same bucket with the same id 
number, the data content in the dictionary is not corresponding to the current 
flat table partition anymore.

 

Therefore, the above situation eventually leads to the encode being listed as 0 
in the flat table, and then the count distinct value is abnormal.

 

 

 

> Enhance global dict on flat table encoding stage logging & retry
> ----------------------------------------------------------------
>
>                 Key: KYLIN-5788
>                 URL: https://issues.apache.org/jira/browse/KYLIN-5788
>             Project: Kylin
>          Issue Type: Bug
>          Components: Job Engine
>    Affects Versions: 5.0-beta
>            Reporter: pengfei.zhan
>            Assignee: pengfei.zhan
>            Priority: Major
>             Fix For: 5.0-beta
>
>         Attachments: pic1.jpeg, pic2.jpeg, pic3.jpeg
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to