zhztheplayer opened a new issue, #5177:
URL: https://github.com/apache/incubator-gluten/issues/5177

   ### Description
   
   To achieve one of the original purpose of ACBO: to make Gluten's rule 
simpler, and to make the rule list cleaner, some kind of migrations on our 
current rule code will be suggested. 
   
   The following is what I could assume:
   
   #### ACBO=on vs ACBO=off
   
   I'd raise a new approach around current entrance of ACBO. Will be inclined 
to having two individual source code files (or folders) for ACBO=on and 
ACBO=off, to store the rule list of the each way. By doing this, it would be 
easier for developers to know what and how ACBO actually does.
   
   Worth noting that the migration would not remove rules from `ACBO=off` as of 
now. We only doing code refactors to make `ACBO=on` cover  more columnar 
optimizations in `ACBO=off`.
   
   #### The gap of CH backend
   
   CH backend doesn't currently support ACBO=on. We should eliminate the 
following gaps as far as I know to make them compatible:
   
   - In CH backend there is some other plan nodes that are coupled with others, 
for example AFAIK the broadcast exchange doesn't work with C2R / R2C. 
   
   There is a PR https://github.com/apache/incubator-gluten/pull/5101 to verify 
CH backend and TPC-H, we can make it pass asap.
   
   #### Rules to migrate
   
   Ideally we'll migrate most of columnar rules that are not 100% heuristic. 
The following is a initial list and since I may not be the writer of these 
rules, so please correct me if I get anything wrong.
   
   1. `FallbackOnANSIMode`
   Will keep this in heuristics since it's a global switch.
   3. `FallbackMultiCodegens`
    - choice 1: In ACBO, tune cost model to slightly increase Velox's BHJ to 
larger than Vanilla Spark's BHJ. Then optimizer will fallback consecutive 
Gluten BHJs when it sees some.
    - choice 2: Keep as heuristic
   4. `PlanOneRowRelation`
   Will keep this in heuristics since it's a global switch.
   5. `FallbackEmptySchemaRelation`
   Will keep this in heuristics since it's a global switch.
   6. `MergeTwoPhasesHashBaseAggregate` (CH only)
    - choice 1: Could add a ACBO rule to merge aggregates.
    - choice 2: Keep as heuristic
   7. Spark rewrite rules: `RewriteIn`, `RewriteMultiChildrenCount`, 
`RewriteCollect`, `RewriteTypedImperativeAggregate`, `PullOutPreProject`, 
`PullOutPostProject`:
   Move them to ACBO. ACBO would naturally do such "tentative" transformations 
better than RBO. ACBO could enumerate the possible plans then try to find the 
one that is executable and have lowest cost. For example, if the transformation 
yields too many of C2Rs/R2Cs, or just an inexecutable plan, optimizer will not 
choose it.
   8. `AddTransformHintRule`
   In ACBO, merge this main validation rule together with transformation rule. 
Which means, ACBO should validate the plan before it does each tentative 
transformation.
   9. `FallbackBloomFilterAggIfNeeded`
    - choice 1: In ACBO, add an independent physical property to indicate if 
the bloom filter buffer data is vanilla or Gluten/Velox. Then 
      optimizer would only choose plans that have consistent buffer data format.
    - choice 2: Keep as heuristic
   10. `ImplementFilter`, `ImplementAggregate`, `ImplementExchange`, 
`ImplementJoin`, `ImplementOthers`
   Can be moved to ACBO, to do transformation.
   11. `RemoveNativeWriteFilesSortAndProject`
   The "remove sort and project" part can be perfectly done by ACBO by property 
requirements. The other `Empty2Null` looks like some kind of validation that 
can be moved into ACBO's transformation rule. Confirmation required.
   12. `RewriteTransformer`
   ACBO could directly use the plugged rules, ideally.
   13. `EnsureLocalSortRequirements`
   It's ACBO's natural job. Though we just need to add ordering property to 
property model.
   14. `CollapseProjectExecTransformer`
   ACBO can use that rule.
   15. `genExtendedColumnarTransformRules` (Velox only)
   Currently only flushable agg rule. Can move to ACBO's agg transformation 
rule.
   16. `GlutenConfig.getConf.extendedColumnarTransformRules`
   Keep it at the end of heuristic rule list to allow user defined rewrites. 
Will not add to ACBO.
   17. `ExpandFallbackPolicy` (whole stage fallback)
   This can be done by ACBO by setting more reasonable costs to C2R / R2C. Then 
optimizer would consider the cost of a plan with a lot of C2Rs / R2Cs higher 
then will not choose it.
   18. `InsertTransitions`, `TransformPostOverrides`, 
`InsertColumnarToColumnarTransitions`, `RemoveGlutenTableCacheColumnarToRow`
   ACBO doesn't need these rules. C2Rs / R2Cs can be added via property 
enforcement.
   19. `RemoveTopmostColumnarToRow`
   Since this one is for compatible with Spark, will keep it in heuristics.
   20. `genExtendedColumnarPostRules`
   Probably keep in heuristics.
   21. `ColumnarCollapseTransformStages`
   Probably keep in heuristics.
   22. `extendedColumnarRules`
   Keep in heuristics.
   23. `GlutenFallbackReporter`
   Keep in heuristics.
   24. `RemoveTransformHintRule`
   Keep in heuristics.
   
   #### Removal of ACBO=off
   
   This can be one of the final goal if all goes well as expected. But we will 
not take this action in short term.
   
   #### Note
   
   This topic would not only be related to code quality and maintenance. After 
we have a cleaner rule list and plan node definitions that are fully compatible 
to ACBO, then we can decide whether to move on from ACBO's current 
responsibility (fallback processing) to more advanced optimizations that could 
be powered by ACBO.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to