lgbo-ustc commented on issue #7647:
URL:
https://github.com/apache/incubator-gluten/issues/7647#issuecomment-2440482254
## Correctness
The all grouping functions disscussed here all could be accumulated, that is
$g(v_0, v_1, ..., v_n) = g(v_0, v_1, ..., v_i) + g(v_{i+1}, v_{v+2}, ...,
v_n)\ where\ i \in (0, n)$
Consider we aggregate on following two grouping keys, $(x_0, y_0), (x_0,
nil)$. It could be extended to more complex cases.
They have values as following
$(x_0, y_0) \rightarrow ( v_0^{y_0}, v_1^{y_0}, ..., v_{n_0}^{y_0})$
$(x_0, nil) \rightarrow (v_0^{y_0}, v_1^{y_0}, ..., v_{n_0}^{y_0}),
(v_0^{y_1}, v_1^{y_1}, ..., v_{n_1}^{y_1}), ..., (v_0^{y_k}, v_1^{y_k}, ...,
v_{n_1}^{y_k})$
### aggregate after expand
#### after expand
After apply the expand, each row is expand into two
$x_0, y_0\rightarrow g_0: ( v_0^{y_0}, v_1^{y_0}, ..., v_{n_0}^{y_0}), g_1:(
v_0^{nil_{y_0}}, v_1^{nil_{y_0}}, ..., v_{n_0}^{nil_{y_0}})$
$x_0, y_1\rightarrow g_0: ( v_0^{y_1}, v_1^{y_1}, ..., v_{n_1}^{y_1}), g_1:(
v_0^{nil_{y_1}}, v_1^{nil_{y_1}}, ..., v_{n_1}^{nil_{y_1}})$
....
$x_0, y_k\rightarrow g_0: ( v_0^{y_k}, v_1^{y_k}, ..., v_{n_k}^{y_k}), g_1:(
v_0^{nil_{y_k}}, v_1^{nil_{y_k}}, ..., v_{n_k}^{nil_{y_k}})$
here $v_i^{y_j} = v_i^{nil_{y_j}}$, but we will let column $y$ be null in
this row.
#### aggregate
Let the expand's result as the input of aggregate, we have following results
for grouping keys $(x_0, y_0)$ and $(x_0, nil)$.
$x_0, y_0 \rightarrow G(x_0, y_0, g_0) = g(v_0^{y_0}, v_1^{y_0}, ...,
v_{n_0}^{y_0})$
$x_0, nil \rightarrow G(x_0, nil, g_1) =\sum_0^k G(x_0, nil_{y_k}, g_1) =
\sum_0^k g(v_0^{nil_{y_k}}, v_1^{nil_{y_k}}, ..., v_{n_k}^{nil_{y_k}})$
Notice here, $G(x_0, y_i, g_0) = G(x_0, nil_{y_i}, g1)$, since they have the
same inputs.
### expand after partial aggregate
#### after partial aggregate
First we apply partial aggregate on the original inputs, and have following
result
$x_0, y_0 \rightarrow G(x_0, y_0)$
$x_0,y_1 \rightarrow G(x_0, y_1)$
...
$x_0, y_k \rightarrow G(x_0, y_k)$
#### expand
Each intermedate result row is expaned into two
$x_0, y_0 \rightarrow g_0:G(x_0, y_0), g_1: G(x_0, nil_{y_0})$
$x_0, y_1 \rightarrow g_0:G(x_0, y_1), g_1: G(x_0, nil_{y_1})$
...
$x_0, y_k \rightarrow g_0:G(x_0, y_k), g_1: G(x_0, nil_{y_k})$
#### final aggregate
$x_0, y_o \rightarrow G(x_0, y_0, g_0)$
$x_0, nil \rightarrow \sum_0^k G(x_0, nil_{y_k}, g_1) = G(x_0, nill, g_1)$
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]