crepererum commented on issue #1972: URL: https://github.com/apache/arrow-datafusion/issues/1972#issuecomment-1078838974
# Egg @pjmore great write up, thank you. I have a few notes on your observations regarding egg: 1. It might be worth sharing them with the egg developers, they have a [quite active discussion forum](https://github.com/egraphs-good/egg/discussions) and seem to be quite eager to hear what people build using egg (or where they fail) 2. I think the explosion of graph sizes can be handled somewhat by tuning the cost function, by match guards (like the ones that are also used to to prevent mis-optimization of division by zero) or by pruning (their math examples uses [this trick](https://github.com/egraphs-good/egg/blob/f1238e3a95c7e4e65ce01384f2f88198d26657f9/tests/math.rs#L107-L108)) 3. The `(A OR B) -> (B OR A)` rule you bring up is actually an interesting one. When experimenting with egg I found that these symmetry rules are really helpful to write a maintainable, short, and easy to understand rules set. Imagine the rule `(A OR B) AND (A OR C) -> A OR (B AND C)`. This rule would only work properly if you combine it with the symmetry rule or you end up enumerating all "permutations" of the simplification rule. Now we might conclude that the symmetry rule should only be applied once or not on processed nodes, however for nesting it's quite important to be able to re-apply it to new nodes as well. 4. I think the egg `Analysis` trait is extremely powerful, even a bit too complex since it can do multiple things like analysis (which then can be used by rules to prevent optimizations, see [this example](https://github.com/egraphs-good/egg/blob/f1238e3a95c7e4e65ce01384f2f88198d26657f9/tests/math.rs#L144-L153)), complex optimization passes (e.g. constant folding) and pruning. # Multi-objective optimization Another note multi-objective optimizations: From experience as well as mid-level understanding of optimization in general, having multiple optimizations targets is always tricky (but depending on the use case not unsolvable). If you really wanna keep the goals separate, then comparing solutions to your issue can only be done by strict comparison (e.g. plan A is better than B if it has lower cast AND risk). More technically, you stick to the [Pareto Front](https://en.wikipedia.org/wiki/Pareto_front). However this often steals you quite some potential, since massive improvements in one metric might result in slight disadvantages in another metric (e.g. you might be able shave of 90% cost by increasing the risk by 5%). Now most people would intuitively decide to take the risk here. So how do you balance the two metrics? What's often done is a linear combination (like `total = A * risk + B * cost`, `A` and `B` being tuning parameters). This obviously has its own issue (like all metrics must "scale" in the same linear way, tuning parameters are tricky to get right). Also see [Multiple-criteria decision-making](https://en.wikipedia.org/wiki/Multiple-criteria_decision_analysis). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
