Jonathan Vexler created HUDI-8037:
-------------------------------------

             Summary: Partition query for transformed value incorrectly prunes 
valid partitions
                 Key: HUDI-8037
                 URL: https://issues.apache.org/jira/browse/HUDI-8037
             Project: Apache Hudi
          Issue Type: Bug
          Components: spark, spark-sql
            Reporter: Jonathan Vexler
            Assignee: Jonathan Vexler
             Fix For: 0.15.1


With timestamp keygen you can have a partition column with timestamps, but then 
use the keygen so it will create partitions based on days so that all records 
that have a timestamp on 7-31-2024 will go to the same parititon even though 
the values in the partition column differ by hours and minutes etc.

This causes a problem with partition pruning. lets say you query "select * from 
table where partition < 7-31-2024 at 7am and partition > 7-31-2024 at 6am ". 
Since the file structure has the partition of just 7-31-2024, that will be 
interpreted as 7-31-2024 at 12am. So the partition will be pruned from the 
search space.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to