andygrove opened a new pull request, #4428:
URL: https://github.com/apache/datafusion-comet/pull/4428

   ## Which issue does this PR close?
   
   Part of #4419 (the documentation-only phase).
   
   ## Rationale for this change
   
   Comet's documentation conflated several distinct ideas under the word 
"native": the implementation language (Rust vs JVM), pipeline membership 
(handled by Comet vs falls back to Spark), and the data format (Arrow columnar 
vs Spark rows). The same kind of ambiguity appears in the shuffle naming, where 
both implementations are columnar and both use Arrow IPC but only one operator 
name says "Columnar." Issue #4419 lays out a clearer vocabulary so the docs 
stop overloading "native" and can scale to a roadmap where some JVM code paths 
(today Scala UDF codegen; soon Arrow UDFs and hybrid impls) also live inside 
the Comet pipeline.
   
   ## What changes are included in this PR?
   
   Documentation prose only. No code changes, no operator renames, no 
plan-stability golden updates, and no new style-guide page (that comes later). 
The vocabulary applied here matches the rules in #4419:
   
   - **Arrow-native** is now the term for the data-format property that unifies 
the pipeline (operators, expressions, shuffle, and broadcast all consume and 
produce Arrow batches).
   - **Comet pipeline** replaces "the native Comet path" / "on the native Comet 
path" / "accelerated by Comet" for membership.
   - **Rust-implemented** / **native Rust** / **Rust code** is used for the 
implementation-language axis.
   - Compound forms that fix their meaning are kept: `native shuffle`, `native 
scan` (paired with `CometBatchScan`), and `Arrow-native`.
   - Bare "native execution" / "runs natively" / "the native path" as vague 
adjectives are removed and replaced with the specific axis they referred to.
   
   The biggest single rewrite is in 
`docs/source/user-guide/latest/understanding-comet-plans.md`, where the "three 
kinds of nodes" framing becomes four:
   
   | Category | Example |
   | --- | --- |
   | Arrow-native Rust operators | `CometProject`, `CometHashAggregate`, 
`CometSort` |
   | Arrow-native JVM expressions | Scala UDF codegen (today); Arrow UDFs and 
hybrid impls (future) |
   | Arrow-native JVM plumbing | `CometUnion`, `CometCoalesce`, 
`CometBroadcastExchange` |
   | Spark fallback | `Project`, `HashAggregate`, plain `Exchange` |
   
   Other targeted files: `docs/source/index.md` (value prop refreshed to lead 
with the Arrow-native framing), 
`docs/source/user-guide/latest/scala_java_udfs.md` (the "native Comet path" 
wording the issue specifically calls out), 
`docs/source/contributor-guide/plugin_overview.md`, 
`docs/source/contributor-guide/native_shuffle.md`, 
`docs/source/contributor-guide/jvm_shuffle.md`, and 
`docs/source/about/gluten_comparison.md` (now notes the JVM-on-Arrow path as a 
Comet differentiator). A sweep then cleans up the remaining 
bare-"native-execution" cases across the contributor guide and user guide.
   
   Operator renames (`CometExchange` → `CometNativeShuffleExchange`, etc.) are 
explicitly out of scope here. They land in a follow-on PR with deprecation 
aliases and plan-stability goldens, as proposed in #4419's migration plan.
   
   ## How are these changes tested?
   
   Documentation only. Verified locally that `sphinx-build` produces no new 
warnings, that the rewritten plan-node section of 
`understanding-comet-plans.md` renders with the four new subsection headings 
(Arrow-Native Rust Operators / Arrow-Native JVM Plumbing / Arrow-Native JVM 
Expressions / Shuffle Operators / Columnar/Row Transitions), and that a final 
grep across `docs/source/` for banned phrases (`runs natively`, `native 
execution`, `the native path`, etc.) returns zero hits outside the historical 
changelog files.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to