This is an automated email from the ASF dual-hosted git repository. github-merge-queue[bot] pushed a commit to branch gh-readonly-queue/main/pr-22003-73ea5fd12c67b55d8391ab701e14229d90dbe4be in repository https://gitbox.apache.org/repos/asf/datafusion.git
commit 89ac320dfdfcda012518ce783f918450ea36b127 Author: Tim Saucer <[email protected]> AuthorDate: Mon May 11 21:20:15 2026 -0400 docs: add llms.txt ecosystem hub at site root (#22003) ## Which issue does this PR close? - Addresses part of apache/datafusion-python#1394 ("Make it easier for agents to generate datafusion-python code") — this is **PR 6** in the implementation plan ([comment](https://github.com/apache/datafusion-python/issues/1394#issuecomment-4252413645)): the upstream `apache/datafusion` `llms.txt` hub. ## Rationale for this change [llms.txt](https://llmstxt.org) is an emerging convention for exposing a machine-readable, agent-facing entry point at a site's docs root. Subprojects in the DataFusion ecosystem are starting to publish their own (`apache/datafusion-python` PR apache/datafusion-python#1505 added one). The main `datafusion.apache.org` site is the natural top-level discovery point for the whole ecosystem, so it should expose a hub `llms.txt` that points agents at: - the core DataFusion (Rust) user / library / contributor guides and Rust API docs, - each subproject's docs root, where agents following the llmstxt.org convention can probe `<docs root>/llms.txt` for project-specific guidance. Net effect: an agent fetching `https://datafusion.apache.org/llms.txt` lands in a categorized directory of the entire ecosystem's agent guidance. ## What changes are included in this PR? - `docs/source/llms.txt` — new file, llmstxt.org schema. Sections: Core DataFusion (Rust), Subprojects, Optional. The Subprojects section links to docs roots (not pending `llms.txt` URLs) and includes a one-line note describing the probe convention so the hub stays correct as subprojects ship their own files. - `docs/source/conf.py` — `html_extra_path = ["llms.txt"]` so Sphinx copies the file verbatim to the build output root, served at `https://datafusion.apache.org/llms.txt`. - `dev/release/rat_exclude_files.txt` — exclude `docs/source/llms.txt` from the RAT license-header check (the file body is rendered markdown and cannot carry the standard `..` comment header without breaking the format). ## Are these changes tested? No automated tests. The change is a single static file plus a Sphinx config line that mirrors a pattern already used in `apache/datafusion-python` (`html_extra_path = ["llms.txt"]`, PR apache/datafusion-python#1505). Verification will be done at deploy time: confirm `https://datafusion.apache.org/llms.txt` resolves and renders. ## Are there any user-facing changes? Yes — adds a new public URL `https://datafusion.apache.org/llms.txt`. No existing pages are modified. No API changes. Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]> --- dev/release/rat_exclude_files.txt | 1 + docs/source/conf.py | 4 ++++ docs/source/llms.txt | 26 ++++++++++++++++++++++++++ 3 files changed, 31 insertions(+) diff --git a/dev/release/rat_exclude_files.txt b/dev/release/rat_exclude_files.txt index 7953a5b4e2..f5ce368df7 100644 --- a/dev/release/rat_exclude_files.txt +++ b/dev/release/rat_exclude_files.txt @@ -60,6 +60,7 @@ datafusion/proto-common/src/generated/prost.rs .github/ISSUE_TEMPLATE/bug_report.yml .github/ISSUE_TEMPLATE/feature_request.yml .github/workflows/docs.yaml +docs/source/llms.txt **/node_modules/* datafusion/wasmtest/pkg/* clippy.toml diff --git a/docs/source/conf.py b/docs/source/conf.py index 03dcfb5bfa..c8027fc71b 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -109,6 +109,10 @@ html_context = { # so a file named "default.css" will overwrite the builtin "default.css". html_static_path = ["_static"] +# Copy agent-facing files (llms.txt) verbatim to the site root so they +# resolve at the conventional URL `https://datafusion.apache.org/llms.txt`. +html_extra_path = ["llms.txt"] + html_logo = "_static/images/2x_bgwhite_original.png" html_css_files = ["theme_overrides.css"] diff --git a/docs/source/llms.txt b/docs/source/llms.txt new file mode 100644 index 0000000000..5d738107c8 --- /dev/null +++ b/docs/source/llms.txt @@ -0,0 +1,26 @@ +# Apache DataFusion + +> Apache DataFusion is an extensible query engine written in Rust that uses Apache Arrow as its in-memory format. This file is a directory of agent-facing entry points for the DataFusion ecosystem — the Rust core query engine and its subprojects. Subproject `llms.txt` files contain the project-specific guidance for writing code against each one. + +## Core DataFusion (Rust) + +- [User guide](https://datafusion.apache.org/user-guide/introduction.html): install, example usage, SQL, DataFrame, expressions, configuration, explain plans. +- [Library user guide](https://datafusion.apache.org/library-user-guide/index.html): embedding DataFusion, extending SQL, custom table providers, building logical plans, the query optimizer. +- [Contributor guide](https://datafusion.apache.org/contributor-guide/index.html): development environment, architecture, testing, release management, governance. +- [Rust API docs (`docs.rs`)](https://docs.rs/datafusion/latest/datafusion/): generated reference for the `datafusion` crate. +- [GitHub repository](https://github.com/apache/datafusion): source, issues, pull requests. + +## Subprojects + +Each subproject may expose its own `llms.txt` at `<docs root>/llms.txt` — agents following the [llmstxt.org](https://llmstxt.org) convention can probe these paths for project-specific guidance. + +- [DataFusion Python](https://datafusion.apache.org/python/): Python bindings — SQL and lazy DataFrame API over Apache Arrow. +- [DataFusion Ballista](https://datafusion.apache.org/ballista/): distributed execution extension for DataFusion. +- [DataFusion Comet](https://datafusion.apache.org/comet/): Apache Spark accelerator built on DataFusion. + +## Optional + +- [Blog](https://datafusion.apache.org/blog/): release notes and ecosystem updates. +- [crates.io `datafusion`](https://crates.io/crates/datafusion): published crate. +- [Code of conduct](https://github.com/apache/datafusion/blob/main/CODE_OF_CONDUCT.md) +- [Apache Software Foundation](https://apache.org) --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
