clintropolis commented on a change in pull request #10370:
URL: https://github.com/apache/druid/pull/10370#discussion_r487841757
##########
File path:
core/src/main/java/org/apache/druid/math/expr/BinaryLogicalOperatorExpr.java
##########
@@ -57,6 +57,17 @@ protected final double evalDouble(double left, double right)
// Use Double.compare for more consistent NaN handling.
return Evals.asDouble(Double.compare(left, right) < 0);
}
+
+ @Nullable
+ @Override
+ public ExprType getOutputType(InputBindingTypes inputTypes)
+ {
+ ExprType implicitCast = super.getOutputType(inputTypes);
+ if (ExprType.STRING.equals(implicitCast)) {
+ return ExprType.LONG;
+ }
+ return implicitCast;
Review comment:
Good eye, you spotted a thing I was purposefully ignoring because I
think it is not really great behavior, or consistent with other math functions.
The default behavior for those 2 input math operators is to consider the
arguments as doubles as you've noticed, regardless of whether or not they are
both doubles. I suspect it is implemented like this to allow null values inputs
to still work as zeros in 'default' mode
(`druid.generic.useDefaultValueForNull=true`), because the current expressions
are not strongly typed so these nulls all end up as string values. See [this
comment](https://github.com/apache/druid/pull/10370#discussion_r486762521) for
a bit more explanation, and [this
comment](https://github.com/apache/druid/pull/10084#discussion_r446486836) for
some of the problems that this causes.
It is also a bit unintuitive behavior. A string column which contains
numeric values will work in one of these math operators, so something like `2.0
+ '3.1' = 5.1` works magically. But it will also potentially treat the string
value as 0 if the string is not a number in default mode, so `2.0 + 'foo' =
2.0` (or null in sql compatible mode). The math functions (`min`, `max`, etc)
instead treat either input as string as a 0/null output without evaluating the
function.
I'm not actually sure what the best behavior here is, the math function
behavior seems a bit more intuitive to me, where either input being a string
produces a null output, so I chose to use that here. This PR is setting up for
vectorized expressions, which are going to be strongly typed before processing,
which I think makes this inconsistent/confusing/magical behavior not necessary.
I was planning to raise a discussion about this part in a future PR since it
isn't actually affecting anything yet, but I think it is good to start talking
about it even though it isn't wired up to anything yet.
Also, I really think 'default' mode complicates the expression system quite
a lot. It would be my preference for the expression system to always behave in
an SQL compatible manner, and default mode only come into play on the
boundaries for data coming in and going out, but I haven't fully thought
through the implications of this and it requires some discussion I think, and
might be a bit dramatic of a change.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]