Re: [PR] Minor: Improve documentation for AggregateUDFImpl::accumulator and `AccumulatorArgs` and examples [arrow-datafusion]

via GitHub Wed, 03 Apr 2024 06:06:00 -0700


alamb commented on code in PR #9920:
URL: https://github.com/apache/arrow-datafusion/pull/9920#discussion_r1549703074



##########
datafusion/expr/src/function.rs:
##########
@@ -67,6 +88,24 @@ impl<'a> AccumulatorArgs<'a> {
             sort_exprs,
         }
     }
+
+    /// Return a not yet implemented error if IGNORE NULLs is true
+    pub fn check_ignore_nulls(&self, name: &str) -> Result<()> {
+        if self.ignore_nulls {

Review Comment:
   > There is also a problem because if they forget to add 
`check_ignore_nulls/check_order_by` in the accumulator, they can still run the 
function successfully. This approach does not _force_ the user to check their 
options because datafusion implements them, not the user.
   
   That is a good point --- in fact it actually affects built in aggregates too 
today
   
   ```sql
   ❯ select count(*) from (values (1), (null), (2));
   +----------+
   | COUNT(*) |
   +----------+
   | 3        |
   +----------+
   1 row in set. Query took 0.039 seconds.
   
   ❯ select count(*) IGNORE NULLS from (values (1), (null), (2));
   +----------+
   | COUNT(*) |
   +----------+
   | 3        |
   +----------+
   1 row in set. Query took 0.001 seconds.
   ```
   
   I think this is a sepate issue, and not made worse by this PR -- I filed 
https://github.com/apache/arrow-datafusion/issues/9924 to track. I suggest we 
work on improving it as a follow on PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Minor: Improve documentation for AggregateUDFImpl::accumulator and `AccumulatorArgs` and examples [arrow-datafusion]

Reply via email to