[
https://issues.apache.org/jira/browse/HUDI-7267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Y Ethan Guo updated HUDI-7267:
------------------------------
Fix Version/s: 1.0.2
> csi will cause data loss during sql query
> -----------------------------------------
>
> Key: HUDI-7267
> URL: https://issues.apache.org/jira/browse/HUDI-7267
> Project: Apache Hudi
> Issue Type: Bug
> Components: index
> Reporter: Knight Chess
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.0.2
>
> Attachments: image-2023-12-28-13-29-15-943.png
>
>
> from the picture, csi will use parquet chunk block meta calculate min/max
> value, and save it to mdt col stat. For complex cols, such as **info
> array<struct<name: string, age: int>>** , parquet meta will contain only
> `info.array.name`, `infor.array.age`, but hudi will only calculate `info`
> column, so this meta in mdt will be null.
> And if sql expression contain `IsNotNull(info)`, the file will all be skip.
> And consider common cols, which will be add in the future and old file will
> not contain this col, may cause some other question. So, make code logical
> clean, Check for null before evaluating the value:min/mav/nullValue.
> !image-2023-12-28-13-29-15-943.png|width=1458,height=798!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)