zhztheplayer opened a new pull request, #5836: URL: https://github.com/apache/incubator-gluten/pull/5836
So far in RAS planner, when an offload rule is applied on a Spark plan node (assumed as node A), the following operations are conducted ordinally: 1. Apply all rewrite rules (in heuristic planner, the ones used by RewriteSparkPlanRulesManager) on node A; 2. Let 1's output be tree A, apply all included offload rules on tree A, output is tree B; 3. Return tree B to planner. This could be problematic while a rewrite rule pulls pre / post projects from the input plan node, for example if node A is an `aggregation 1`, then tree A may become `pre proj + aggregation 2 + post proj`. Then if `aggregation 2` is not offload-able however the projects are offload-able, then the final output tree B will still be returned to planner then be added into memo. This is something we should avoid since `aggregation 2` is again a vanilla plan node that may be handled by the same rewrite rule so infinite planning may be led. The patch reworks the offloading procedure to take better compatibility with the rewrite rules including solving the above issue. After the patch, when the same Spark plan node comes, RAS planner does the following: 1. Identify the node A's Scala class. Say [BaseAggregateExec]. 2. Apply all rewrite rules on node A, output is tree A; 3. Find the specific offload rule for node type [BaseAggregateExec], apply that single offload rule on tree A, transversely, output is tree B; 4. Return tree B to planner. By doing an extra type identification, we can avoid apply offload rule to non-interested pulled nodes. As a result, when [BaseAggregateExec] is not offload-able, tree B will be shapely the same with tree A so will be then discarded and be not returned to planner. Infinite planning can be avoided. Depends on #5824 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
