jonvex opened a new pull request, #11717:
URL: https://github.com/apache/hudi/pull/11717

   ### Change Logs
   
   With timestamp keygen you can have a partition column with timestamps, but 
then use the keygen so it will create partitions based on days so that all 
records that have a timestamp on 7-31-2024 will go to the same parititon even 
though the values in the partition column differ by hours and minutes etc.
   
   This causes a problem with partition pruning. lets say you query "select * 
from table where partition < 7-31-2024 at 7am and partition > 7-31-2024 at 6am 
". Since the file structure has the partition of just 7-31-2024, that will be 
interpreted as 7-31-2024 at 12am. So the partition will be pruned from the 
search space.
   
   This pr fixes the issue by rounding the query values based on the output 
format. The format of this is year month day, so it will round to the nearest 
day. The query for partition pruning will then be "select * from table where 
partition < 7-31-2024 and partition > 7-31-2024 ". This will still not yield 
any results because it requires the partition to be less than and greater than 
the same day.
   
   To fix that, we also replace any < or > with <= and >=. So now the query is 
"select * from table where partition <= 7-31-2024 and partition => 7-31-2024 ". 
7-31-2024 will now not be pruned, and the original filter will be applied by 
spark.
   
   (we can replace all < and > because we are only looking at partition filters 
in a simple timestamp keygen scenario.)
   
   This does not fix cow or mor ro queries, because we treat those as just 
plain parquet tables and spark will handle the partition pruning.
   
   ### Impact
   
   fix bug for some scenarios
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   N/A
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to