Re: [PR] Convert variance sample to udaf [datafusion]

via GitHub Sat, 01 Jun 2024 14:24:01 -0700


yyin-dev commented on PR #10713:
URL: https://github.com/apache/datafusion/pull/10713#issuecomment-2143594191


   > > @jayzhan211 I'm working on a change, but can you help me understand the 
semantics here:
   > > ```
   > > # csv_query_distinct_variance
   > > query R
   > > SELECT var(distinct c2) FROM aggregate_test_100
   > > ----
   > > 2.5
   > > 
   > > statement error DataFusion error: This feature is not implemented: 
VAR\(DISTINCT\) aggregations are not available
   > > SELECT var(c2), var(distinct c2) FROM aggregate_test_100
   > > ```
   > > 
   > > 
   > >     
   > >       
   > >     
   > > 
   > >       
   > >     
   > > 
   > >     
   > >   
   > > Why should the first query succeed but not the second one? Feel free to 
point me to any SQL / datafusion doc.
   > 
   > I think it is because of optimize rule `SingleDistinctToGroupBy`, this 
rule convert distinct to group by, so the first query is no longer `distinct`, 
you can try adding `explain` to see the optimized logical plan.
   
   I'm thinking about the right way to implement error-raising. Before 
migration, the logic was implemented in 
`physical-exp/src/aggregate/build_in.rs:create_aggregate_expr` as a match 
statement.
   
   After migration, the error should probably be raised in 
`phyical-expr-common/src/aggregate/mod.rs:create_aggregate_expr`. There are two 
options: 
   
   1. Get the udaf's name and implement similar logic. This is simpler but less 
principled?
   
   2. Adds a `support_distinct` to the `AggregateUDFImpl` trait. This feels 
like a better solution.
   
   What do you think?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Convert variance sample to udaf [datafusion]

Reply via email to