beyond1920 commented on code in PR #8101:
URL: https://github.com/apache/hudi/pull/8101#discussion_r1135032944
##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/ExpressionEvaluators.java:
##########
@@ -156,12 +156,20 @@ public static Evaluator fromExpression(CallExpression
expr) {
public interface Evaluator extends Serializable {
/**
- * Decides whether it's possible to match based on the column stats.
+ * Evaluates whether it's possible to match based on the column stats.
*
* @param columnStatsMap column statistics
- * @return
+ * @return false if it's not possible to match, true otherwise.
*/
boolean eval(Map<String, ColumnStats> columnStatsMap);
+
+ /**
+ * Evaluates whether it matches based on the column values.
+ *
+ * @param columnValues column values
+ * @return true if it's matches, false otherwise.
+ */
+ boolean eval(Object[] columnValues);
}
Review Comment:
`ColumnValues` does not comes from literals.
For p_date = '20220101',
`ColumnValues` are all p_date values in table. `ColumnStats` are statistics
of p_date column. The literal values are both used in those two cases to
compare with `ColumnValues` or `ColumnStats`.
Pleas see detail in
https://github.com/apache/hudi/pull/8102/files#diff-3098f0a6c1cae51c7c4c99166d92e56a80129afb2fc12c03072e3e101587c932
It's not good to wrap those value as `ColumnStats ` because they are totally
different thing. For `ColumnStats`, `evaluates` method returns false if there
is no any possible to match, it's just a best effort estimate.
For `ColumnValues`, `evaluates` method true if it's matches, it's an exact
matching rule based on exact values.
It's clear to keep two API method.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]