andygrove commented on PR #4519: URL: https://github.com/apache/datafusion-comet/pull/4519#issuecomment-4835067711
A few smaller follow-ups after the latest round of changes: 1. **Config doc wording.** `spark.comet.exec.transitionRevert.enabled` says the threshold counts "C2R and R2C transition pairs", but `countTransitions` only counts `ColumnarToRowTransition` nodes. The `maxTransitions` doc also still talks about full round-trips. Could we reword both so they match what the code counts? Something like "the number of columnar-to-row (C2R) transitions in a stage" would be clearer. 2. **`CometSparkSessionExtensions` docstring is stale.** The pipeline diagrams at lines 54 and 77 still only list `EliminateRedundantTransitions` under `postColumnarTransitions`. Worth adding `RevertNativeForTransitionHeavyStages` there so the documented flow matches `CometExecColumnar`. 3. **Rule ordering vs `EliminateRedundantTransitions`.** `EliminateRedundantTransitions` runs before this new rule. When the revert wraps a row-based subtree in `RowToColumnarExec` to feed a `CometShuffleExchangeExec` with `CometColumnarShuffle`, that R2C is exactly what `EliminateRedundantTransitions.scala:103-110` would have stripped, but it has already run. Not a correctness bug, just a missed cleanup. Could either swap the order or run a small cleanup after revert. 4. **Dead guard in `insertTransitions`.** Line 170's `!node.isInstanceOf[QueryStageExec]` check is unreachable because `transformStageUp` already skips stage-boundary children. Safe to drop. 5. **`lazy val` for conf reads.** `enabled` and `maxTransitions` are safe today only because `CometExecColumnar.postColumnarTransitions` is a `def` that rebuilds the rule per access. If anyone changes that to a `val`, the rule would freeze conf at first read. A `private def` would be more defensive. Overall the direction looks good. The stage-boundary fix and the new regression tests address my earlier concern, and defaulting to `false` keeps this low-risk. Disclosure: I drafted this comment with help from AI (Claude Code). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
