alamb commented on code in PR #4145:
URL: https://github.com/apache/arrow-datafusion/pull/4145#discussion_r1017074282
##########
datafusion/optimizer/src/unwrap_cast_in_comparison.rs:
##########
@@ -32,25 +32,40 @@ use datafusion_expr::{
};
use std::sync::Arc;
-/// The rule can be used to the numeric binary comparison with literal expr,
like below pattern:
-/// `cast(left_expr as data_type) comparison_op literal_expr` or `literal_expr
comparison_op cast(right_expr as data_type)`.
-/// The data type of two sides must be equal, and must be signed numeric type
now, and will support more data type later.
+/// [`UnwrapCastInComparison`] Attempts to remove casts from
+/// comparisons to literals ([`ScalarValue`]s) by applying the casts
+/// to the literals if possible. It is inspired by the optimizer rule
+/// `UnwrapCastInBinaryComparison` of Spark.
///
-/// If the binary comparison expr match above rules, the optimizer will check
if the value of `literal`
-/// is in within range(min,max) which is the range(min,max) of the data type
for `left_expr` or `right_expr`.
+/// Removing casts often improves performance because:
+/// 1. The cast is done once (to the literal) rather than to every value
+/// 2. Can enable other optimizations such as predicate pushdown that
+/// don't support casting
///
-/// If this is true, the literal expr will be casted to the data type of expr
on the other side, and the result of
-/// binary comparison will be `left_expr comparison_op cast(literal_expr,
left_data_type)` or
-/// `cast(literal_expr, right_data_type) comparison_op right_expr`. For better
optimization,
-/// the expr of `cast(literal_expr, target_type)` will be precomputed and
converted to the new expr `new_literal_expr`
-/// which data type is `target_type`.
-/// If this false, do nothing.
+/// The rule is applied to expressions of the following forms:
+///
+/// 1. `cast(left_expr as data_type) comparison_op literal_expr`
+/// 2. `cast(literal_expr) IN (expr1, expr2, ...)`
+///
+/// If the expression matches one of the forms above, the rule will
+/// ensure the value of `literal` is in within range(min,max) of the
Review Comment:
```suggestion
/// ensure the value of `literal` is in within range(min, max) of the
```
##########
datafusion/optimizer/src/unwrap_cast_in_comparison.rs:
##########
@@ -32,25 +32,40 @@ use datafusion_expr::{
};
use std::sync::Arc;
-/// The rule can be used to the numeric binary comparison with literal expr,
like below pattern:
-/// `cast(left_expr as data_type) comparison_op literal_expr` or `literal_expr
comparison_op cast(right_expr as data_type)`.
-/// The data type of two sides must be equal, and must be signed numeric type
now, and will support more data type later.
+/// [`UnwrapCastInComparison`] Attempts to remove casts from
Review Comment:
```suggestion
/// [`UnwrapCastInComparison`] attempts to remove casts from
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]