[PR] chore(audit): audit misc expressions across Spark 3.4.3, 3.5.8, 4.0.1, 4.1.1 [datafusion-comet]

via GitHub Wed, 27 May 2026 15:58:02 -0700


andygrove opened a new pull request, #4474:
URL: https://github.com/apache/datafusion-comet/pull/4474


   ## Which issue does this PR close?
   
   Closes #.
   
   ## Rationale for this change
   
   Continuation of the per-category expression audit. Same pattern as #4473 
(collection), #4470 (json), #4469 (struct), #4461 (string), using the updated 
`audit-comet-expression` skill in #4468 (now also covers Spark 4.1.1).
   
   ## What changes are included in this PR?
   
   ### Support-doc audit notes
   
   Add per-version audit sub-bullets to `monotonically_increasing_id`, `rand`, 
`randn`, `spark_partition_id`, and `user` in 
`docs/source/contributor-guide/spark_expressions_support.md`.
   
   - `MonotonicallyIncreasingID`, `SparkPartitionID`, and `CurrentUser` are 
byte-for-byte identical across all four versions. `CurrentUser` is 
`Unevaluable` and resolved to a string literal by the analyzer's 
`ResolveCurrentLike` rule before Comet sees the plan, so no Comet serde is 
needed for `user`.
   - `Rand` and `Randn` are refactored in Spark 4.0 (the `RDG` abstract class 
becomes a trait, with a new `NondeterministicUnaryRDG` base, and 
`ExpressionWithRandomSeed.expressionToSeed` rejects non-literal seeds at 
analysis time with `QueryCompilationErrors.invalidRandomSeedParameter`) with no 
runtime behaviour change. 4.1.1 is identical to 4.0.
   
   ### Support-level consistency fix (in `nondetermenistic.scala`)
   
   - `CometRand` / `CometRandn`: lift the non-literal-seed fallback out of 
`convert` (where it was silently returning `None`) and into `getSupportLevel`, 
via a shared `nonLiteralSeedReason` constant on the new `seedExprOf` hook on 
`CometRandCommonSerde`. `getUnsupportedReasons` now documents the restriction. 
Pre-4.0 Spark would silently fail at runtime for a column-reference seed; 4.0+ 
rejects at analysis time.
   
   ### Tracking issues filed for follow-up
   
   None. No correctness divergences were found.
   
   ### Audit process
   
   Audited directly using the `audit-comet-expression` skill (4 Spark versions 
per #4468). Five small/identical serdes, no parallel subagents needed.
   
   ## How are these changes tested?
   
   - `./mvnw test -Dsuites="org.apache.comet.CometExpressionSuite rand 
expression" -Dtest=none` (1 test passes)
   - `./mvnw test -Dsuites="org.apache.comet.CometExpressionSuite randn 
expression" -Dtest=none` (1 test passes)
   - `./mvnw test -Dsuites="org.apache.comet.CometExpressionSuite 
spark_partition_id" -Dtest=none` (1 test passes)
   - `./mvnw test -Dsuites="org.apache.comet.CometExpressionSuite 
monotonically_increasing_id" -Dtest=none` (1 test passes)
   - `make core` succeeds with the serde change.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] chore(audit): audit misc expressions across Spark 3.4.3, 3.5.8, 4.0.1, 4.1.1 [datafusion-comet]

Reply via email to