kosiew commented on code in PR #22564:
URL: https://github.com/apache/datafusion/pull/22564#discussion_r3332260030


##########
datafusion/functions/src/math/log.rs:
##########
@@ -358,43 +386,83 @@ impl ScalarUDFImpl for LogFunc {
         } else {
             lit(ScalarValue::new_ten(&number_datatype)?)
         };
+        let base_datatype = info.get_data_type(&base)?;
+
+        if is_zero_literal(&number, &number_datatype)?
+            || is_zero_literal(&base, &base_datatype)?
+        {
+            return Ok(ExprSimplifyResult::Original(original_log_args(
+                num_args, &base, &number,
+            )?));
+        }
+
+        let base_is_valid_literal = is_valid_log_base_literal(&base)?;
 
         match number {
             Expr::Literal(value, _)
-                if value == ScalarValue::new_one(&number_datatype)? =>
+                if value == ScalarValue::new_one(&number_datatype)?
+                    && base_is_valid_literal =>
             {
                 Ok(ExprSimplifyResult::Simplified(lit(ScalarValue::new_zero(
                     &info.get_data_type(&base)?,
                 )?)))
             }
             Expr::ScalarFunction(ScalarFunction { func, mut args })
-                if is_pow(&func) && args.len() == 2 && base == args[0] =>
+                if is_pow(&func)
+                    && args.len() == 2
+                    && base == args[0]
+                    && base_is_valid_literal =>

Review Comment:
   Thanks for tightening the rewrite conditions. I think there is still one 
edge case here.
   
   The `log(a, power(a, b)) => b` rewrite is still applied when `a` is a valid 
literal base. That removes the runtime `log` evaluation, which also removes the 
zero-value validation.
   
   For example:
   
   ```sql
   select log(10.0, power(10.0, column1))
   from (values (-400.0), (2.0)) as t(column1);
   ```
   
   With this rewrite, the result becomes `-400.0, 2.0`.
   
   Without the rewrite, `power(10.0, -400.0)` underflows to `0.0`, so the first 
row reaches `invoke_with_args` and raises `cannot take logarithm of zero`, 
which matches the behavior of:
   
   ```sql
   select log(10.0, power(10.0, -400.0));
   ```
   
   Could we either avoid this rewrite unless the value can be proven non-zero, 
or preserve the runtime zero-value validation when the value side is still an 
expression? A regression test covering an exponent that underflows to zero 
would also help lock in the expected behavior.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to