lgbo-ustc commented on issue #7647:
URL: 
https://github.com/apache/incubator-gluten/issues/7647#issuecomment-2440482254

   ## Correctness
   
   The all grouping functions disscussed here all could be accumulated, that is
   
   $g(v_0, v_1, ..., v_n) = g(v_0, v_1, ..., v_i) + g(v_{i+1}, v_{v+2}, ..., 
v_n)\ where\ i \in (0, n)$
   
   
   Consider we aggregate on following two grouping keys, $(x_0, y_0), (x_0, 
nil)$. It could be extended to more complex cases.
   They have values as following
   $(x_0, y_0) \rightarrow ( v_0^{y_0}, v_1^{y_0}, ..., v_{n_0}^{y_0})$
   $(x_0, nil) \rightarrow (v_0^{y_0}, v_1^{y_0}, ..., v_{n_0}^{y_0}), 
(v_0^{y_1}, v_1^{y_1}, ..., v_{n_1}^{y_1}), ..., (v_0^{y_k}, v_1^{y_k}, ..., 
v_{n_1}^{y_k})$
   
   ### aggregate after expand
   
   #### after expand
   After apply the expand, each row is expand into two
   $x_0, y_0\rightarrow g_0: ( v_0^{y_0}, v_1^{y_0}, ..., v_{n_0}^{y_0}), g_1:( 
v_0^{nil_{y_0}}, v_1^{nil_{y_0}}, ..., v_{n_0}^{nil_{y_0}})$
   $x_0, y_1\rightarrow g_0: ( v_0^{y_1}, v_1^{y_1}, ..., v_{n_1}^{y_1}), g_1:( 
v_0^{nil_{y_1}}, v_1^{nil_{y_1}}, ..., v_{n_1}^{nil_{y_1}})$
   ....
   $x_0, y_k\rightarrow g_0: ( v_0^{y_k}, v_1^{y_k}, ..., v_{n_k}^{y_k}), g_1:( 
v_0^{nil_{y_k}}, v_1^{nil_{y_k}}, ..., v_{n_k}^{nil_{y_k}})$
   
   here $v_i^{y_j} = v_i^{nil_{y_j}}$, but we will let column $y$ be null in 
this row.
   
   #### aggregate
   
   Let the expand's result as the input of aggregate, we have following results 
for grouping keys $(x_0, y_0)$ and  $(x_0, nil)$.
   $x_0, y_0 \rightarrow G(x_0, y_0, g_0) = g(v_0^{y_0}, v_1^{y_0}, ..., 
v_{n_0}^{y_0})$
   $x_0, nil \rightarrow G(x_0, nil, g_1) =\sum_0^k G(x_0, nil_{y_k}, g_1) = 
\sum_0^k g(v_0^{nil_{y_k}}, v_1^{nil_{y_k}}, ..., v_{n_k}^{nil_{y_k}})$
   
   Notice here, $G(x_0, y_i, g_0) = G(x_0, nil_{y_i}, g1)$, since they have the 
same inputs.
   
   
   ### expand after partial aggregate
   
   #### after partial aggregate
   
   First we apply partial aggregate on the original inputs, and have following 
result
   $x_0, y_0 \rightarrow G(x_0, y_0)$ 
   $x_0,y_1 \rightarrow G(x_0, y_1)$
   ...
   $x_0, y_k \rightarrow G(x_0, y_k)$
   
   #### expand
   Each intermedate result  row is expaned into two
   $x_0, y_0 \rightarrow g_0:G(x_0, y_0), g_1: G(x_0, nil_{y_0})$
   $x_0, y_1 \rightarrow g_0:G(x_0, y_1), g_1: G(x_0, nil_{y_1})$
   ...
   $x_0, y_k \rightarrow g_0:G(x_0, y_k), g_1: G(x_0, nil_{y_k})$
   
   #### final aggregate
   $x_0, y_o \rightarrow G(x_0, y_0, g_0)$
   $x_0, nil \rightarrow \sum_0^k G(x_0, nil_{y_k}, g_1) = G(x_0, nill, g_1)$
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to