andygrove opened a new pull request, #4585: URL: https://github.com/apache/datafusion-comet/pull/4585
## Which issue does this PR close? N/A. Follow-on to #4583. This reduces drift and maintenance friction in the expression reference doc by generating it from code. > Note: this PR is stacked on #4583. Until that merges, this diff also contains its 2 prettier-formatting commits; they will drop out once #4583 lands. ## Rationale for this change `docs/source/user-guide/latest/expressions.md` was hand-maintained: every PR that added or changed an expression edited the tables by hand. That let the doc drift from reality (a function supported in code but still listed as planned, or a new Spark built-in never added) and made large aligned tables conflict-prone. The Compatibility Guide is already generated by `GenerateDocs` from each serde's `getCompatibleNotes` / `getIncompatibleReasons` / `getUnsupportedReasons`. This PR extends the same generator to also produce the expression reference, so the overview is derived from the code that actually decides support, and stays complete and current. ## What changes are included in this PR? - New pure helper `org.apache.comet.ExpressionReference`: status model, row resolution, table rendering, and Spark `FunctionRegistry` enumeration (unit-tested in isolation). - `GenerateDocs` extended to: enumerate every Spark built-in (with its group), derive Supported status and a Compatibility Guide link from the serde maps, and fall back to a curated status list for planned / not-planned functions. The curated list lives in `GenerateDocs.scala` on purpose: that file is excluded from the heavy CI path filters in `dev/ci/compute-changes.py`, so editing the list (for example when an issue is filed) does not trigger the Spark SQL and Iceberg jobs. - `expressions.md` per-group tables are now generated between `<!--BEGIN:EXPR_TABLE[group]-->` markers; the prose was updated to drop the "Incorrect by default" status. - Doc generation pinned to the Spark 4.1 profile (newest `FunctionRegistry`) in `dev/generate-release-docs.sh` and `docs/build.sh`. - The reference is a concise overview: it carries a short summary plus a link into the Compatibility Guide for detail, with no duplicated note text. Known follow-ups (not in this PR): populate per-expression summary notes via a new `getExpressionSummary` (currently `None`, so serde-backed rows have sparse notes); add a CI check that fails when the generated doc is stale; rename the curated `PlannedExpr` type now that it also holds Supported entries. ## How are these changes tested? - `ExpressionReferenceSuite` covers the status model, every branch of row resolution (serde + link, serde without page, planned + issue, not-planned, unclassified), and rendering. - `FunctionRegistryEnumerationSuite` verifies enumeration against real Spark built-ins. - Regeneration is idempotent (re-running the generator produces no diff), the generated doc has zero unclassified rows, and all tracking-issue links were verified to match the prior hand-written doc exactly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
