andygrove opened a new issue, #4421: URL: https://github.com/apache/datafusion-comet/issues/4421
## Motivation Both the user guide and contributor guide have grown by accretion: new pages have been added at the end of the toctree without re-grouping. The result is flat lists of 14+ items with no captions, and several pages in positions that don't match how readers actually use them. Pairs naturally with the style guide work in #4419 — easier to reorganize once and apply the new vocabulary in the same rewrite, rather than doing two passes. ## User guide problems Current `docs/source/user-guide/latest/index.rst` is a flat 14-item toctree. Issues: - **"Building From Source" is #2.** That's a contributor concern; new users install Comet pre-built and shouldn't see "build from source" as the second step. - **"ScalaUDF and Java UDF Support" is split** from the other "Supported X" reference pages. - **"Configuration Settings" is at #8.** Users want config near the top right after install. - **"Understanding Comet Plans" is at #10**, but it's the page users hit first when debugging fallback. - **No grouping/captions** — flat list, no narrative. ### Proposed user-guide grouping | Section | Pages | |---|---| | **Getting Started** | Installing Comet · Configuration Settings | | **What Comet Supports** | Supported Data Sources · Supported Data Types · Supported Operators · Supported Expressions · Scala/Java UDF Support · Compatibility Guide | | **Operating Comet** | Understanding Comet Plans · Tuning Guide · Metrics Guide | | **Integrations** | Iceberg Guide · Kubernetes Guide | | **Advanced** | Building From Source (or move to contributor guide entirely) | ## Contributor guide problems Current `docs/source/contributor-guide/index.md` is a flat 23-item toctree. Issues: - **Architecture pages come before dev setup.** New contributors land on FFI/Shuffle internals at #3-5 before they've built the project. *Development Guide* (#6) should come right after *Getting Started*. - **Observability tooling is scattered.** *Debugging* (#7), *Benchmarking* (#9), *Tracing* (#15), and *Profiling* (#16) are spread across the list but cover one concern. - **Test pages are buried at #17-19**, despite testing being central to the contributor workflow. - **Reference tables interrupt the flow.** *Supported Spark Expressions* and *Supported Spark Configurations* (#13-14) are large generated reference pages sitting in the middle. - **ANSI Error Propagation (#8)** is a deep technical page mixed in with operational ones; belongs in architecture. ### Proposed contributor-guide grouping | Section | Pages | |---|---| | **Getting Started** | Getting Started · Development Guide | | **Project Architecture** | Comet Plugin Overview · Arrow FFI · JVM Shuffle · Native Shuffle · ANSI Error Propagation | | **Adding Functionality** | Adding a New Operator · Adding a New Expression · Adding a New Spark Version | | **Testing** | Comet SQL Tests · Spark SQL Tests · Iceberg Spark Tests | | **Debugging and Performance** | Debugging Guide · Benchmarking Guide · Profiling · Tracing | | **Reference** | Supported Spark Expressions · Supported Spark Configurations | | **Project Mechanics** | Bug Triage · Release Process · Roadmap · GitHub Issue Tracker | ## Cross-cutting improvements - **Tighter guide intros.** Both `index` pages currently list contents generically. Add a short "How to use this guide" paragraph with bolded entry points (e.g. *"New users: start with Installing Comet, then Configuration Settings, then Understanding Comet Plans"*) so a reader knows what to read first without scanning the full toctree. - **Add a glossary page.** Once #4419 lands, a one-page glossary defining *Comet pipeline*, *Arrow-native*, *Rust-implemented*, *JVM-implemented*, *Spark fallback*, *Arrow IPC*, etc. would be high-leverage and the natural anchor for cross-doc links. - **Add an "Architecture in 5 minutes" bridge page.** The user guide's *Understanding Comet Plans* explains plan output but skips the *why*; the contributor guide's *Plugin Overview* goes straight to internals. A short shared page bridging "what users see in plans" to "how Comet rewrites the plan" would help both audiences. - **Add an FAQ page.** Absorbs questions that come up in issues/Slack and don't have a natural home elsewhere. - **Add a version-support summary.** No global *"Comet $VERSION supports Spark 3.4 / 3.5 / 4.0 with these caveats"* page; each support table has its own version columns. A summary up front would help. ## Smaller issues worth fixing in the same pass - The user guide is `.rst` (Sphinx native); the contributor guide is `.md` with MyST. Harmless, but mildly annoying when editing. Worth deciding on one format. (Likely MyST since most existing pages are markdown.) - `docs/temp/` looks like leftover scaffolding — confirm it can be deleted. - `iceberg.md` (user) vs `iceberg-spark-tests.md` (contributor) — naming is fine; both are appropriately scoped to their audience. ## Migration plan 1. Land the captions/grouping first (toctree changes only — no page moves, no content changes). Low-risk; reviewers can see the new structure. 2. Move *Building From Source* if the consensus is to relocate it. 3. Add the new pages (Glossary, Architecture-in-5, FAQ, Version Support) one at a time as separate PRs. 4. Apply #4419 vocabulary in the same content-rewrite pass, file by file. ## Related - #4419 — Establish nomenclature style guide - #4420 — Add a docs-review Claude skill -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
