beyond1920 commented on code in PR #8101:
URL: https://github.com/apache/hudi/pull/8101#discussion_r1136512519
##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/ExpressionEvaluators.java:
##########
@@ -156,12 +156,20 @@ public static Evaluator fromExpression(CallExpression
expr) {
public interface Evaluator extends Serializable {
/**
- * Decides whether it's possible to match based on the column stats.
+ * Evaluates whether it's possible to match based on the column stats.
*
* @param columnStatsMap column statistics
- * @return
+ * @return false if it's not possible to match, true otherwise.
*/
boolean eval(Map<String, ColumnStats> columnStatsMap);
+
+ /**
+ * Evaluates whether it matches based on the column values.
+ *
+ * @param columnValues column values
+ * @return true if it's matches, false otherwise.
+ */
+ boolean eval(Object[] columnValues);
}
Review Comment:
Yes, for partition pruning, statistics of partition columns for a specified
partition is an exactly value instead of a range with [min, max] and nullCnt.
Of course, we could wrap this exact value in a `columnstats` object, which
min is same with max and nullCnt is 0.
But I prefer not to do so because it would cause unnecessary overhead. For
example, for `Equals`, `NotEqualsTo` , `In` to prune partition, there is no
need to compare the literal value with minValue and maxValue twice, just
compare it with exact value.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]