This is an automated email from the ASF dual-hosted git repository.
github-merge-queue[bot] pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion.git
The following commit(s) were added to refs/heads/main by this push:
new 89ac320dfd docs: add llms.txt ecosystem hub at site root (#22003)
89ac320dfd is described below
commit 89ac320dfdfcda012518ce783f918450ea36b127
Author: Tim Saucer <[email protected]>
AuthorDate: Mon May 11 21:20:15 2026 -0400
docs: add llms.txt ecosystem hub at site root (#22003)
## Which issue does this PR close?
- Addresses part of apache/datafusion-python#1394 ("Make it easier for
agents to generate datafusion-python code") — this is **PR 6** in the
implementation plan
([comment](https://github.com/apache/datafusion-python/issues/1394#issuecomment-4252413645)):
the upstream `apache/datafusion` `llms.txt` hub.
## Rationale for this change
[llms.txt](https://llmstxt.org) is an emerging convention for exposing a
machine-readable, agent-facing entry point at a site's docs root.
Subprojects in the DataFusion ecosystem are starting to publish their
own (`apache/datafusion-python` PR apache/datafusion-python#1505 added
one). The main `datafusion.apache.org` site is the natural top-level
discovery point for the whole ecosystem, so it should expose a hub
`llms.txt` that points agents at:
- the core DataFusion (Rust) user / library / contributor guides and
Rust API docs,
- each subproject's docs root, where agents following the llmstxt.org
convention can probe `<docs root>/llms.txt` for project-specific
guidance.
Net effect: an agent fetching `https://datafusion.apache.org/llms.txt`
lands in a categorized directory of the entire ecosystem's agent
guidance.
## What changes are included in this PR?
- `docs/source/llms.txt` — new file, llmstxt.org schema. Sections: Core
DataFusion (Rust), Subprojects, Optional. The Subprojects section links
to docs roots (not pending `llms.txt` URLs) and includes a one-line note
describing the probe convention so the hub stays correct as subprojects
ship their own files.
- `docs/source/conf.py` — `html_extra_path = ["llms.txt"]` so Sphinx
copies the file verbatim to the build output root, served at
`https://datafusion.apache.org/llms.txt`.
- `dev/release/rat_exclude_files.txt` — exclude `docs/source/llms.txt`
from the RAT license-header check (the file body is rendered markdown
and cannot carry the standard `..` comment header without breaking the
format).
## Are these changes tested?
No automated tests. The change is a single static file plus a Sphinx
config line that mirrors a pattern already used in
`apache/datafusion-python` (`html_extra_path = ["llms.txt"]`, PR
apache/datafusion-python#1505). Verification will be done at deploy
time: confirm `https://datafusion.apache.org/llms.txt` resolves and
renders.
## Are there any user-facing changes?
Yes — adds a new public URL `https://datafusion.apache.org/llms.txt`. No
existing pages are modified. No API changes.
Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
---
dev/release/rat_exclude_files.txt | 1 +
docs/source/conf.py | 4 ++++
docs/source/llms.txt | 26 ++++++++++++++++++++++++++
3 files changed, 31 insertions(+)
diff --git a/dev/release/rat_exclude_files.txt
b/dev/release/rat_exclude_files.txt
index 7953a5b4e2..f5ce368df7 100644
--- a/dev/release/rat_exclude_files.txt
+++ b/dev/release/rat_exclude_files.txt
@@ -60,6 +60,7 @@ datafusion/proto-common/src/generated/prost.rs
.github/ISSUE_TEMPLATE/bug_report.yml
.github/ISSUE_TEMPLATE/feature_request.yml
.github/workflows/docs.yaml
+docs/source/llms.txt
**/node_modules/*
datafusion/wasmtest/pkg/*
clippy.toml
diff --git a/docs/source/conf.py b/docs/source/conf.py
index 03dcfb5bfa..c8027fc71b 100644
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@@ -109,6 +109,10 @@ html_context = {
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ["_static"]
+# Copy agent-facing files (llms.txt) verbatim to the site root so they
+# resolve at the conventional URL `https://datafusion.apache.org/llms.txt`.
+html_extra_path = ["llms.txt"]
+
html_logo = "_static/images/2x_bgwhite_original.png"
html_css_files = ["theme_overrides.css"]
diff --git a/docs/source/llms.txt b/docs/source/llms.txt
new file mode 100644
index 0000000000..5d738107c8
--- /dev/null
+++ b/docs/source/llms.txt
@@ -0,0 +1,26 @@
+# Apache DataFusion
+
+> Apache DataFusion is an extensible query engine written in Rust that uses
Apache Arrow as its in-memory format. This file is a directory of agent-facing
entry points for the DataFusion ecosystem — the Rust core query engine and its
subprojects. Subproject `llms.txt` files contain the project-specific guidance
for writing code against each one.
+
+## Core DataFusion (Rust)
+
+- [User guide](https://datafusion.apache.org/user-guide/introduction.html):
install, example usage, SQL, DataFrame, expressions, configuration, explain
plans.
+- [Library user
guide](https://datafusion.apache.org/library-user-guide/index.html): embedding
DataFusion, extending SQL, custom table providers, building logical plans, the
query optimizer.
+- [Contributor
guide](https://datafusion.apache.org/contributor-guide/index.html): development
environment, architecture, testing, release management, governance.
+- [Rust API docs (`docs.rs`)](https://docs.rs/datafusion/latest/datafusion/):
generated reference for the `datafusion` crate.
+- [GitHub repository](https://github.com/apache/datafusion): source, issues,
pull requests.
+
+## Subprojects
+
+Each subproject may expose its own `llms.txt` at `<docs root>/llms.txt` —
agents following the [llmstxt.org](https://llmstxt.org) convention can probe
these paths for project-specific guidance.
+
+- [DataFusion Python](https://datafusion.apache.org/python/): Python bindings
— SQL and lazy DataFrame API over Apache Arrow.
+- [DataFusion Ballista](https://datafusion.apache.org/ballista/): distributed
execution extension for DataFusion.
+- [DataFusion Comet](https://datafusion.apache.org/comet/): Apache Spark
accelerator built on DataFusion.
+
+## Optional
+
+- [Blog](https://datafusion.apache.org/blog/): release notes and ecosystem
updates.
+- [crates.io `datafusion`](https://crates.io/crates/datafusion): published
crate.
+- [Code of
conduct](https://github.com/apache/datafusion/blob/main/CODE_OF_CONDUCT.md)
+- [Apache Software Foundation](https://apache.org)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]