[
https://issues.apache.org/jira/browse/HUDI-5557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexey Kudinkin updated HUDI-5557:
----------------------------------
Affects Version/s: 0.12.2
(was: 0.12.1)
> Wrong candidate files found in metadata table
> ----------------------------------------------
>
> Key: HUDI-5557
> URL: https://issues.apache.org/jira/browse/HUDI-5557
> Project: Apache Hudi
> Issue Type: Bug
> Components: metadata, spark-sql
> Affects Versions: 0.12.2
> Reporter: ruofan
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 0.13.1, 0.12.3
>
>
> Suppose the hudi table has five fields, but only two fields are indexed. When
> part of the filter condition in SQL comes from index fields and the other
> part comes from non-index fields, the candidate files queried from the
> metadata table are wrong.
> For example following hudi table schema
> {code:java}
> name: varchar(128)
> age: int
> addr: varchar(128)
> city: varchar(32)
> job: varchar(32) {code}
> table properties
> {code:java}
> hoodie.table.type=MERGE_ON_READ
> hoodie.metadata.enable=true
> hoodie.metadata.index.column.stats.enable=true
> hoodie.metadata.index.column.stats.column.list='name,city'
> hoodie.enable.data.skipping=true {code}
> sql
> {code:java}
> select * from hudi_table where name='tom' and age=18; {code}
> if we set hoodie.enable.data.skipping=false, the data can be found. But if we
> set hoodie.enable.data.skipping=true, we can't find the expected data.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)