andygrove opened a new issue, #4420:
URL: https://github.com/apache/datafusion-comet/issues/4420

   ## Motivation
   
   Comet's docs are growing — user guide, contributor guide, compatibility 
tables, operator references, plus per-release changelogs. Quality varies, 
terminology drifts, and once the nomenclature style guide from #4419 lands, we 
need a way to keep new and existing docs aligned with it. PR reviewers 
shouldn't have to be the last line of defense for prose quality, and 
contributors shouldn't have to internalize a multi-page style guide before 
opening a docs PR.
   
   A sibling skill to `review-comet-pr`, focused on documentation, would let 
reviewers (and contributors during self-review) get a docs-expert pass over a 
PR or an existing doc. The skill produces advisory feedback — it does not edit 
files or post comments directly, mirroring `review-comet-pr`.
   
   ## What the skill should check
   
   ### Style guide adherence (post #4419)
   
   - Bare *"native"* used as a vague adjective (e.g. *"runs natively"*, *"the 
native path"*) — flag and suggest the specific axis: *Rust-implemented*, 
*Arrow-native*, *Comet pipeline*.
   - Permitted compounds (*native Rust*, *Arrow-native*, *native shuffle* in 
shuffle-pair context) recognized and not flagged.
   - *"Vectorized"* used as a synonym for *columnar* — flag.
   - *"Falls back to Spark"* / *"Spark fallback"* used consistently for the 
same concept.
   - Operator names match plan output exactly (e.g. `CometProject`, not 
`CometProjectExec` or `Comet Project`).
   
   ### Information architecture
   
   - Audience is identifiable from the first paragraph — user, contributor, or 
operator.
   - Heading hierarchy is correct (no jumps from H2 to H4, no multiple H1s).
   - Tables used for structured comparisons (operators, configs, support 
matrices) rather than bulleted prose.
   - Long pages have a brief overview / table of contents at the top.
   
   ### Completeness
   
   - New user-guide pages have: overview, prerequisites or version 
applicability, at least one runnable example, links to related docs.
   - New contributor-guide pages have: scope/audience, prerequisites, the 
procedure, how to verify it worked.
   - New configs in prose are also added to `configs.md` with key, default, 
type, version added.
   - New operators in prose are also added to the operator reference in 
`understanding-comet-plans.md` and `operators.md`.
   - New expressions are reflected in `spark_expressions_support.md`.
   
   ### Accuracy
   
   - Config keys mentioned exist in code (grep `spark.comet.*` against the 
repo).
   - Operator names mentioned exist as classes.
   - Spark version claims (*"Spark 3.4+"*, *"Spark 4.0 only"*) match the 
version profiles the surrounding code lives in.
   - Code samples are syntactically valid for the language they claim (Scala / 
Python / SQL / shell).
   - Links resolve (relative paths exist; external links return 200).
   
   ### Voice and tone
   
   - Present tense, active voice, second person where appropriate.
   - No marketing fluff (*"blazing fast"*, *"seamlessly"*, *"powerful"*).
   - No future-tense aspirations in user-facing docs (*"will support"*, *"is 
planned to"*) — those belong in the roadmap, not the user guide.
   - No stale TODOs or `XXX` markers in user-visible files.
   
   ### Apache project hygiene
   
   - New `.md` files have the ASF license header.
   - No copyrighted third-party content without attribution.
   - Diagrams/images referenced exist in the repo.
   
   ### Comet-specific conventions
   
   - User-guide content lives under `docs/source/user-guide/latest/`, 
contributor-guide content under `docs/source/contributor-guide/`. Flag 
misplacements.
   - Spark version-specific content notes the version profile clearly.
   - Plan-output examples use real plan formatting (tree-form with `+-` 
connectors), not invented prose.
   - Cross-references use relative links within `docs/source/`.
   
   ## Scope
   
   The skill should support two modes:
   
   1. **PR review mode** — given a PR number or branch, surface docs changes 
(and prose changes in code comments / config descriptions / Scaladoc) and audit 
them.
   2. **Standalone audit mode** — given a file or directory under 
`docs/source/`, audit it as-is.
   
   In both modes the skill returns a prioritized list (must-fix / should-fix / 
nit) suitable for a reviewer to paste or paraphrase into review comments.
   
   ## Output format
   
   Mirror `review-comet-pr`: produce structured advisory feedback for the 
human, not direct edits. Categorize findings by severity. For each finding:
   
   - File and line reference.
   - The rule that was violated (link to the style guide section).
   - A concrete suggested rewrite when one is obvious.
   
   The skill must not edit files, post PR comments, or push branches.
   
   ## Open questions
   
   - Should the skill enforce stricter rules on user-guide content (lower 
tolerance for jargon, mandatory examples) than contributor-guide content (where 
domain-specific vocabulary is fine)?
   - Should the skill check generated docs (e.g. expression support tables 
generated by `make`) or only hand-written prose?
   - Auto-detection of the style guide source — pin to a specific doc path 
(e.g. `docs/source/contributor-guide/style_guide.md` once #4419 lands), or read 
the latest committed version on each invocation?
   - Is there value in a separate `audit-comet-docs` skill for full-site audits 
versus a single skill that handles both PR review and standalone audit modes?
   
   ## Dependencies
   
   - Style guide doc landed (depends on #4419).
   - Skill author needs read access to the docs tree and the repo's source for 
accuracy checks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to