Zhiting Guo created KYLIN-5650:
----------------------------------

             Summary: In the cloud environment, there is a probability that the 
dictionary metadata file will be read abnormally during building job, resulting 
in incorrect query results.
                 Key: KYLIN-5650
                 URL: https://issues.apache.org/jira/browse/KYLIN-5650
             Project: Kylin
          Issue Type: Bug
          Components: Tools, Build and Test
    Affects Versions: 5.0-alpha
            Reporter: Zhiting Guo
             Fix For: 5.0-alpha
         Attachments: In the cloud environment, there is a probability that the 
dictionary metadata file will be read abnormally during building job, resulting 
in incorrect query results..pdf

Checked the dictionary, there are no duplicate values. Checked the execution 
plan of the build dictionary step, there is no problem. Checked the steps of 
building a flat table and found that there was a problem in the step of flat 
table encoding dictionary.

The reason for the error is that the encoding is not performed after 
repartition according to the dictionary column. As shown in the figure, there 
is no repartition, and the encode column appears in the plan.

There are also the following logs:
{code:java}
2023-03-26T20:26:30,868 INFO  [logger-thread-0] dict.NGlobalDictHDFSStore : 
Commit from 
s3a://datalake-kc-s3-prd-bj/kylin/kcprodYcHG_kylin/datalake_kylin/dict/global_dict/GDT.GDT_CMPLYA_FCT_DIST_RESLT/IS_STAT/working
 to 
s3a://datalake-kc-s3-prd-bj/kylin/kcprodYcHG_kylin/datalake_kylin/dict/global_dict/GDT.GDT_CMPLYA_FCT_DIST_RESLT/IS_STAT/version_1679862387539

2023-03-26T20:31:14,501 INFO  [logger-thread-0] dict.NGlobalDictionaryV2 : 
getMetaInfo versions.length is 12
2023-03-26T20:31:14,547 INFO  [logger-thread-0] dict.NGlobalDictHDFSStore : 
because metaFiles.length is 0, metaInfo is null
2023-03-26T20:31:14,547 INFO  [logger-thread-0] dict.NGlobalDictionaryV2 : 
getMetaInfo metadata is null : [true]{code}
This is on s3, after renaming the dictionary directory, no metadata file is 
queried. However, if the meta is not obtained in the code and no error is 
reported, it is not reasonable to encode directly without repartition. In 
short, the result is that the encoding of the dictionary column on the flat 
table fails.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to