Ma Jian created HUDI-7291:
-----------------------------
Summary: Pushing Down Partition Pruning Conditions to Column Stats
During Data Skipping
Key: HUDI-7291
URL: https://issues.apache.org/jira/browse/HUDI-7291
Project: Apache Hudi
Issue Type: Improvement
Reporter: Ma Jian
In the current implementation of data skipping, column statistics for the
entire table are read and then subjected to data skipping filtering operations
based on these stats. When the table has a large volume of data and a high
number of partitions, this approach can reduce the efficiency of data skipping,
as partition pruning conditions are not utilized.
By pushing down the conditions for partition filtering to after the column
statistics are read and applying pruning at that point, the size of the column
stats that are subsequently involved in data skipping will be significantly
reduced. This not only saves time on later computations but also conserves
memory.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)