andygrove opened a new issue, #4421:
URL: https://github.com/apache/datafusion-comet/issues/4421

   ## Motivation
   
   Both the user guide and contributor guide have grown by accretion: new pages 
have been added at the end of the toctree without re-grouping. The result is 
flat lists of 14+ items with no captions, and several pages in positions that 
don't match how readers actually use them.
   
   Pairs naturally with the style guide work in #4419 — easier to reorganize 
once and apply the new vocabulary in the same rewrite, rather than doing two 
passes.
   
   ## User guide problems
   
   Current `docs/source/user-guide/latest/index.rst` is a flat 14-item toctree. 
Issues:
   
   - **"Building From Source" is #2.** That's a contributor concern; new users 
install Comet pre-built and shouldn't see "build from source" as the second 
step.
   - **"ScalaUDF and Java UDF Support" is split** from the other "Supported X" 
reference pages.
   - **"Configuration Settings" is at #8.** Users want config near the top 
right after install.
   - **"Understanding Comet Plans" is at #10**, but it's the page users hit 
first when debugging fallback.
   - **No grouping/captions** — flat list, no narrative.
   
   ### Proposed user-guide grouping
   
   | Section | Pages |
   |---|---|
   | **Getting Started** | Installing Comet · Configuration Settings |
   | **What Comet Supports** | Supported Data Sources · Supported Data Types · 
Supported Operators · Supported Expressions · Scala/Java UDF Support · 
Compatibility Guide |
   | **Operating Comet** | Understanding Comet Plans · Tuning Guide · Metrics 
Guide |
   | **Integrations** | Iceberg Guide · Kubernetes Guide |
   | **Advanced** | Building From Source (or move to contributor guide 
entirely) |
   
   ## Contributor guide problems
   
   Current `docs/source/contributor-guide/index.md` is a flat 23-item toctree. 
Issues:
   
   - **Architecture pages come before dev setup.** New contributors land on 
FFI/Shuffle internals at #3-5 before they've built the project. *Development 
Guide* (#6) should come right after *Getting Started*.
   - **Observability tooling is scattered.** *Debugging* (#7), *Benchmarking* 
(#9), *Tracing* (#15), and *Profiling* (#16) are spread across the list but 
cover one concern.
   - **Test pages are buried at #17-19**, despite testing being central to the 
contributor workflow.
   - **Reference tables interrupt the flow.** *Supported Spark Expressions* and 
*Supported Spark Configurations* (#13-14) are large generated reference pages 
sitting in the middle.
   - **ANSI Error Propagation (#8)** is a deep technical page mixed in with 
operational ones; belongs in architecture.
   
   ### Proposed contributor-guide grouping
   
   | Section | Pages |
   |---|---|
   | **Getting Started** | Getting Started · Development Guide |
   | **Project Architecture** | Comet Plugin Overview · Arrow FFI · JVM Shuffle 
· Native Shuffle · ANSI Error Propagation |
   | **Adding Functionality** | Adding a New Operator · Adding a New Expression 
· Adding a New Spark Version |
   | **Testing** | Comet SQL Tests · Spark SQL Tests · Iceberg Spark Tests |
   | **Debugging and Performance** | Debugging Guide · Benchmarking Guide · 
Profiling · Tracing |
   | **Reference** | Supported Spark Expressions · Supported Spark 
Configurations |
   | **Project Mechanics** | Bug Triage · Release Process · Roadmap · GitHub 
Issue Tracker |
   
   ## Cross-cutting improvements
   
   - **Tighter guide intros.** Both `index` pages currently list contents 
generically. Add a short "How to use this guide" paragraph with bolded entry 
points (e.g. *"New users: start with Installing Comet, then Configuration 
Settings, then Understanding Comet Plans"*) so a reader knows what to read 
first without scanning the full toctree.
   - **Add a glossary page.** Once #4419 lands, a one-page glossary defining 
*Comet pipeline*, *Arrow-native*, *Rust-implemented*, *JVM-implemented*, *Spark 
fallback*, *Arrow IPC*, etc. would be high-leverage and the natural anchor for 
cross-doc links.
   - **Add an "Architecture in 5 minutes" bridge page.** The user guide's 
*Understanding Comet Plans* explains plan output but skips the *why*; the 
contributor guide's *Plugin Overview* goes straight to internals. A short 
shared page bridging "what users see in plans" to "how Comet rewrites the plan" 
would help both audiences.
   - **Add an FAQ page.** Absorbs questions that come up in issues/Slack and 
don't have a natural home elsewhere.
   - **Add a version-support summary.** No global *"Comet $VERSION supports 
Spark 3.4 / 3.5 / 4.0 with these caveats"* page; each support table has its own 
version columns. A summary up front would help.
   
   ## Smaller issues worth fixing in the same pass
   
   - The user guide is `.rst` (Sphinx native); the contributor guide is `.md` 
with MyST. Harmless, but mildly annoying when editing. Worth deciding on one 
format. (Likely MyST since most existing pages are markdown.)
   - `docs/temp/` looks like leftover scaffolding — confirm it can be deleted.
   - `iceberg.md` (user) vs `iceberg-spark-tests.md` (contributor) — naming is 
fine; both are appropriately scoped to their audience.
   
   ## Migration plan
   
   1. Land the captions/grouping first (toctree changes only — no page moves, 
no content changes). Low-risk; reviewers can see the new structure.
   2. Move *Building From Source* if the consensus is to relocate it.
   3. Add the new pages (Glossary, Architecture-in-5, FAQ, Version Support) one 
at a time as separate PRs.
   4. Apply #4419 vocabulary in the same content-rewrite pass, file by file.
   
   ## Related
   
   - #4419 — Establish nomenclature style guide
   - #4420 — Add a docs-review Claude skill


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to