HaoYang670 commented on code in PR #4234:
URL: https://github.com/apache/arrow-datafusion/pull/4234#discussion_r1024818973
##########
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:
##########
@@ -760,6 +766,29 @@ impl<'a, S: SimplifyInfo> ExprRewriter for Simplifier<'a,
S> {
},
},
+ // unfold the expression round(source, n)
+ Expr::ScalarFunction {
+ fun: BuiltinScalarFunction::Round,
+ args,
Review Comment:
> This makes the logical plan harder to use for other query engines
Why is that? Could you please give an example?
> We can implement the functionality in the physical plan in a separate PR.
The advantages of implementing in physical plan are:
1. much straight forward and easy to debug.
2. we would benefit from arrow-rs, if arrow-rs could support `round(source,
n)` in the future.
The disadvantages of implementing in physical plan are:
1. Rust only support the `round` function with one argument. If we want to
implement in physical plan, we have to assemble some arrow compute kernel
together, which the logic is same as what I've done in the physical plan.
3. We can't optimize the expression if the second function is a constant,
which we have to add additional optimize rules.
The advantages of implementing in logical plan are:
1. Reuse other code to achieve the functionality of `round(source, n)`, so
the implementation could be much easier.
2. We can reuse our expression optimizer to do optimizations, for example
when the 2nd argument is a constant.
3. Flexible. If we plan to change the parameter type of `round` from
`Float`, to `Decimal`, the change can be very easy.
4. Make logical stage and physical stage independent. We can achieve complex
functionality in the SQL query by unfolding and rewriting expressions (or
plans) in the logical stage and we only need to maintain a reduced set of
physical expressions (or plans).
The disadvantages of implementing in logical plan are:
1. Put the implementation in the logical optimizer isn't a good solution, we
should unfold the expression before doing any optimization.
1. the query plan isn't user-friendly to read because `round` is unfolded.
But I think this is not a big deal, as the optimized code is always ugly to
read. Users can also use `explain verbose` to find what has happened.
After weighing the pros and cons, I prefer to implement in logical plan. If
the rust lib or arrow-rs could support `round(source, n)`, in the future, we
can change it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]