ruofan created HUDI-5557:
----------------------------
Summary: Wrong candidate files found in metadata table
Key: HUDI-5557
URL: https://issues.apache.org/jira/browse/HUDI-5557
Project: Apache Hudi
Issue Type: Bug
Components: metadata, spark-sql
Affects Versions: 0.12.1
Reporter: ruofan
Suppose the hudi table has five fields, but only two fields are indexed. When
part of the filter condition in SQL comes from index fields and the other part
comes from non-index fields, the candidate files queried from the metadata
table are wrong.
For example following hudi table schema
{code:java}
name: varchar(128)
age: int
addr: varchar(128)
city: varchar(32)
job: varchar(32) {code}
table properties
{code:java}
hoodie.table.type=MERGE_ON_READ
hoodie.metadata.enable=true
hoodie.metadata.index.column.stats.enable=true
hoodie.metadata.index.column.stats.column.list='name,city'
hoodie.enable.data.skipping=true {code}
sql
{code:java}
select * from hudi_table where name='tom' and age=18; {code}
if we set hoodie.enable.data.skipping=false, the data can be found. But if we
set hoodie.enable.data.skipping=true, we can't find the expected data.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)