joseph-isaacs commented on issue #15524:
URL: https://github.com/apache/datafusion/issues/15524#issuecomment-4030636475
It seems like to me that aggregations AGG over SQL-integers
$\mathbb{Z}_\bot$ need to be examined on a case-by-case basis just like AGG
over integers $\mathbb{Z}$. For aggregations are defined over columns ($c_0,
..., c_n$) we see that
$\text{AGG}(c_0 \oplus^1 c_1 \oplus^2 ... \oplus^n c_n) = AGG(c_0 \oplus^1
c_1 \oplus^2 ... \oplus^n c_n\ \text{filter}\ m)$
where $m$ is the set of row where all elements are defined and all ops
$\oplus^1, .., \oplus^n$ are all null NULL-propagating/annihilating ($\forall
x. \bot \oplus^i x = \bot$). Furthermore RHS is all the contributing rows. This
means under the filter $m$ we can use integer reasoning since there are by
definition no nulls in any rows.
Also notice that is handle non-nullable columns easily.
Now lets think about specific examples:
## SUM
Consider a sum in sql `SUM(a+b+c)` we can write this as `SUM(a filter m) +
SUM(b filter m) + SUM(c filter m)` where `m = a and b and c is not null`. Then
we can use regular mathematical equivalencies over $\sum_i^n c^1_i \oplus ...
\oplus' c^n_i$.
## Count
I think the same idea holds for count `COUNT(a+b+c) = COUNT(m)`
## MAX/MIN
This is likely irreducible due to the [triangle
inequality](https://en.wikipedia.org/wiki/Triangle_inequality) $\forall a, b
\in \mathbb{Z}.\max(a + b) \leq \max(a) + \max(b)$.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]