guojidan commented on PR #9413:
URL: 
https://github.com/apache/arrow-datafusion/pull/9413#issuecomment-1980333704

   > I think that this manner of rewriting might have several downsides that 
need to be considered.
   > 
   > 1. Because `NVL` no longer exists in the function list, some of its 
features will be missing. For example, the feature in issue [Add ScalarUDFs in 
missing function hints 
#9392](https://github.com/apache/arrow-datafusion/issues/9392), and the hint 
after entering wrong parameters.
   > 
   > On this PR's branch:
   > 
   > ```shell
   > DataFusion CLI v36.0.0
   > ❯ select nvl(1);
   > Error during planning: Invalid function 'nvl'.
   > Did you mean 'nanvl'?
   > ```
   > 
   > The hint on the main branch would be better.
   > 
   > ```
   > ❯ select nvl(1);
   > Error during planning: No function matches the given name and argument 
types 'nvl(Int64)'. You might need to add explicit type casts.
   >    Candidate functions:
   >    
nvl(Boolean/UInt8/UInt16/UInt32/UInt64/Int8/Int16/Int32/Int64/Float32/Float64/Utf8/LargeUtf8,
 
Boolean/UInt8/UInt16/UInt32/UInt64/Int8/Int16/Int32/Int64/Float32/Float64/Utf8/LargeUtf8)
   > ```
   > 
   > 2. There may be incorrect result when NVL's parameters contain a 
[volatile](https://github.com/apache/arrow-datafusion/blob/b2ff249bfb918ac6697dbc92b51262a7bdbb5971/datafusion/expr/src/expr.rs#L1903)
 expression.
   >    For example, given an expression `expr` is `nullif(random()>0.5, 
true)`. The result of `select nvl(expr, false)`should always be false.
   >    But if it is rewritten as `nvl(expr, expr, false)`, it could 
potentially yield a NULL value, when the random result of the first `expr` is 
less than 0.5, and the random result of the second `expr` is greater than 0.5.
   > 
   > I have performed such a test on this branch.
   > 
   > ```
   > DataFusion CLI v36.0.0
   > ❯  select nvl(nullif(random()>0.5, true), false) is null from 
(unnest(range(1,100)));
   > 
+--------------------------------------------------------------------------------------------------------------------------+
   > | nvl2(nullif(random() > Float64(0.5),Boolean(true)),nullif(random() > 
Float64(0.5),Boolean(true)),Boolean(false)) IS NULL |
   > 
+--------------------------------------------------------------------------------------------------------------------------+
   > | true                                                                     
                                                |
   > | false                                                                    
                                                |
   > | false                                                                    
                                                |
   > | false                                                                    
                                                |
   > | false                                                                    
                                                |
   > ```
   > 
   > There is approximately a 25% chance of an incorrect result.
   > 
   > ```shell
   > ❯ select count(*) from (
   >    select
   >       nvl(nullif(random()>0.5, true), false)
   >    from (unnest(range(1,10000)))
   > ) as t(a)
   > where a is null;
   > +----------+
   > | COUNT(*) |
   > +----------+
   > | 2517     |
   > +----------+
   > ```
   
   yes, need to redesign rewrite manner 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to