beyond1920 commented on code in PR #8101:
URL: https://github.com/apache/hudi/pull/8101#discussion_r1136512519
##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/ExpressionEvaluators.java:
##########
@@ -156,12 +156,20 @@ public static Evaluator fromExpression(CallExpression
expr) {
public interface Evaluator extends Serializable {
/**
- * Decides whether it's possible to match based on the column stats.
+ * Evaluates whether it's possible to match based on the column stats.
*
* @param columnStatsMap column statistics
- * @return
+ * @return false if it's not possible to match, true otherwise.
*/
boolean eval(Map<String, ColumnStats> columnStatsMap);
+
+ /**
+ * Evaluates whether it matches based on the column values.
+ *
+ * @param columnValues column values
+ * @return true if it's matches, false otherwise.
+ */
+ boolean eval(Object[] columnValues);
}
Review Comment:
Yes, for partition pruning, statistics of partition columns for a specified
partition is an exactly value instead of a range with [min, max] and nullCnt.
Of course, we could wrap this exact value in a `columnstats` object, which
min is same with max and nullCnt is 0.
But I prefer not to do so because it would cause unnecessary overhead. For
example, for `Equals`, `NotEqualsTo`, `In`, there is no need to compare the
literal value with minValue and maxValue twice, just compare it with exact
value.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]