Hello all, Hive Incremental queries on Hoodie currently suffer a limitation of listing all partitions when a datestr is not present (lists .hoodie and the partitions) and end up throwing away a lot of the files (since `_*hoodie*_commit_time` column values filters out those files) . This can be very expensive and can impact query planning time and sometime causes timeouts as well if the table is large. https://issues.apache.org/jira/browse/HUDI-25 tracks the issue.
If we can leverage the timeline and partitions touched by the commits involved in incremental pull, then we can avoid listing all partitions and hence reduce the query planning time. I am planning to send a HIP to discuss this further. Please share your thoughts. Thanks, Sudha