joseph-isaacs commented on issue #15524:
URL: https://github.com/apache/datafusion/issues/15524#issuecomment-4030636475

   It seems like to me that aggregations AGG over SQL-integers 
$\mathbb{Z}_\bot$ need to be examined on a case-by-case basis just like AGG 
over integers $\mathbb{Z}$. For aggregations are defined over columns ($c_0, 
..., c_n$) we see that 
   
   $\text{AGG}(c_0 \oplus^1 c_1 \oplus^2 ... \oplus^n c_n) = AGG(c_0 \oplus^1 
c_1 \oplus^2 ... \oplus^n c_n\ \text{filter}\ m)$
   
    where $m$ is the set of row where all elements are defined and all ops 
$\oplus^1, .., \oplus^n$ are all null NULL-propagating/annihilating ($\forall 
x. \bot \oplus^i x = \bot$). Furthermore RHS is all the contributing rows. This 
means under the filter $m$ we can use integer reasoning since there are by 
definition no nulls in any rows.
   
   Also notice that is handle non-nullable columns easily.
   
   Now lets think about specific examples:
   
   
   ## SUM
   
   Consider a sum in sql `SUM(a+b+c)` we can write this as `SUM(a filter m) + 
SUM(b filter m) + SUM(c filter m)` where `m = a and b and c is not null`. Then 
we can use regular mathematical equivalencies over $\sum_i^n c^1_i \oplus ... 
\oplus' c^n_i$.
   
   ## Count
   
   I think the same idea holds for count `COUNT(a+b+c) = COUNT(m)` 
   
   ## MAX/MIN
   
   This is likely irreducible due to the [triangle 
inequality](https://en.wikipedia.org/wiki/Triangle_inequality) $\forall a, b 
\in \mathbb{Z}.\max(a + b) \leq \max(a) + \max(b)$.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to