timsaucer opened a new pull request, #1579:
URL: https://github.com/apache/datafusion-python/pull/1579

   # Which issue does this PR close?
   
   Closes #.
   
   # Rationale for this change
   
   Phase 2 of the documentation-site refresh started in #1578. With the
   modern pydata-sphinx-theme + navigation in place, this PR moves the
   content format off `.rst` and onto MyST `.md`. The motivation:
   
   - Markdown is the lingua franca of agent-tuned tooling. LLMs trained
     on GitHub and modern docs parse Markdown reliably; reStructuredText
     is a minority dialect that frequently confuses both humans editing
     via PR review and agents reading the source. The Apache
     `datafusion-comet` sibling project completed the same migration
     recently and reported smoother contributor onboarding.
   - MyST is a strict superset of CommonMark with directives for the
     Sphinx features we actually use (toctrees, cross-references,
     code-blocks, admonitions, eval-rst escape hatch).
   - The `myst-parser` extension is already in the docs dependency
     group and was loaded by `conf.py` even before this PR — switching
     the on-disk format is a low-risk, mechanical change.
   
   This PR stacks on #1578 (theme + navbar refresh). It should land
   after #1578.
   
   # What changes are included in this PR?
   
   Format conversion (mechanical, via `rst-to-myst`):
   
   - 33 human-authored `.rst` files under `docs/source/` become 33
     `.md` files — the user guide, contributor guide, IO subsection,
     common-operations subsection, dataframe subsection, top-level
     `index`, and `links`.
   - Toctrees, cross-references, code blocks, hyperlinks, admonitions,
     and license headers all round-trip cleanly.
   
   Manual fixes layered on top of the converter output:
   
   - **Cross-reference anchors.** The converter kebab-cased every
     `(label)=` anchor (e.g. `(io-csv)=`), but every `{ref}` in the
     corpus — including the Python docstrings that `sphinx-autoapi`
     pulls into the API reference — still uses the underscore form
     (`{ref}\`CSV <io_csv>\``). Rewrite the anchors back to underscore
     form (`(io_csv)=`, `(window_functions)=`, `(user_guide_concepts)=`,
     `(execution_metrics)=`, etc.) so existing references resolve
     without churning every callsite.
   - **MyST extensions.** Enable `colon_fence` and `deflist` in
     `myst_enable_extensions` (the converter emits these on a few
     files, notably `dataframe/execution-metrics.md`).
   - **`source_suffix`.** Keep `.rst` registered even though no
     human-authored RST remains: `sphinx-autoapi` generates `.rst`
     under `autoapi/` at build time and Sphinx needs the suffix to
     parse it. The comment in `conf.py` flags this so a future cleanup
     pass doesn't strip it again.
   
   86 `{eval-rst}` blocks remain in the converted output. Every one of
   them wraps a `.. ipython::` directive, which has no first-class MyST
   equivalent in our extensions setup. The blocks render identically
   and don't block the build. Migrating these to a native MyST exec
   syntax is a follow-up that requires either `myst-nb` or a custom
   parser registration — out of scope here.
   
   `AGENTS.md` is updated so the two `.rst` paths called out under
   "Aggregate and Window Function Documentation" point at the new `.md`
   equivalents.
   
   # Are there any user-facing changes?
   
   No behavioral change to the `datafusion` package — only the source
   format of the published documentation. Readers of the rendered site
   will not notice the migration; the HTML output is unchanged. Internal
   cross-references resolve, the `pokemon.csv` ipython example on the
   landing page and the `yellow_tripdata_2021-01.parquet` example on
   the basics page both still execute.
   
   No `api change` label — public APIs untouched.
   
   ## Follow-ups (out of scope for this PR)
   
   - Migrate the 86 `{eval-rst}` `.. ipython::` blocks to a
     MyST-native exec syntax. Requires either pulling in `myst-nb` or
     configuring a per-language parser.
   - Phase 3: multi-version doc publishing (the comet pattern).
   - Phase 4: `asf-site` publishing workflow.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to