This is an automated email from the ASF dual-hosted git repository.

github-merge-queue[bot] pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion.git


The following commit(s) were added to refs/heads/main by this push:
     new 89ac320dfd docs: add llms.txt ecosystem hub at site root (#22003)
89ac320dfd is described below

commit 89ac320dfdfcda012518ce783f918450ea36b127
Author: Tim Saucer <[email protected]>
AuthorDate: Mon May 11 21:20:15 2026 -0400

    docs: add llms.txt ecosystem hub at site root (#22003)
    
    ## Which issue does this PR close?
    
    - Addresses part of apache/datafusion-python#1394 ("Make it easier for
    agents to generate datafusion-python code") — this is **PR 6** in the
    implementation plan
    
([comment](https://github.com/apache/datafusion-python/issues/1394#issuecomment-4252413645)):
    the upstream `apache/datafusion` `llms.txt` hub.
    
    ## Rationale for this change
    
    [llms.txt](https://llmstxt.org) is an emerging convention for exposing a
    machine-readable, agent-facing entry point at a site's docs root.
    Subprojects in the DataFusion ecosystem are starting to publish their
    own (`apache/datafusion-python` PR apache/datafusion-python#1505 added
    one). The main `datafusion.apache.org` site is the natural top-level
    discovery point for the whole ecosystem, so it should expose a hub
    `llms.txt` that points agents at:
    
    - the core DataFusion (Rust) user / library / contributor guides and
    Rust API docs,
    - each subproject's docs root, where agents following the llmstxt.org
    convention can probe `<docs root>/llms.txt` for project-specific
    guidance.
    
    Net effect: an agent fetching `https://datafusion.apache.org/llms.txt`
    lands in a categorized directory of the entire ecosystem's agent
    guidance.
    
    ## What changes are included in this PR?
    
    - `docs/source/llms.txt` — new file, llmstxt.org schema. Sections: Core
    DataFusion (Rust), Subprojects, Optional. The Subprojects section links
    to docs roots (not pending `llms.txt` URLs) and includes a one-line note
    describing the probe convention so the hub stays correct as subprojects
    ship their own files.
    - `docs/source/conf.py` — `html_extra_path = ["llms.txt"]` so Sphinx
    copies the file verbatim to the build output root, served at
    `https://datafusion.apache.org/llms.txt`.
    - `dev/release/rat_exclude_files.txt` — exclude `docs/source/llms.txt`
    from the RAT license-header check (the file body is rendered markdown
    and cannot carry the standard `..` comment header without breaking the
    format).
    
    ## Are these changes tested?
    
    No automated tests. The change is a single static file plus a Sphinx
    config line that mirrors a pattern already used in
    `apache/datafusion-python` (`html_extra_path = ["llms.txt"]`, PR
    apache/datafusion-python#1505). Verification will be done at deploy
    time: confirm `https://datafusion.apache.org/llms.txt` resolves and
    renders.
    
    ## Are there any user-facing changes?
    
    Yes — adds a new public URL `https://datafusion.apache.org/llms.txt`. No
    existing pages are modified. No API changes.
    
    Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
---
 dev/release/rat_exclude_files.txt |  1 +
 docs/source/conf.py               |  4 ++++
 docs/source/llms.txt              | 26 ++++++++++++++++++++++++++
 3 files changed, 31 insertions(+)

diff --git a/dev/release/rat_exclude_files.txt 
b/dev/release/rat_exclude_files.txt
index 7953a5b4e2..f5ce368df7 100644
--- a/dev/release/rat_exclude_files.txt
+++ b/dev/release/rat_exclude_files.txt
@@ -60,6 +60,7 @@ datafusion/proto-common/src/generated/prost.rs
 .github/ISSUE_TEMPLATE/bug_report.yml
 .github/ISSUE_TEMPLATE/feature_request.yml
 .github/workflows/docs.yaml
+docs/source/llms.txt
 **/node_modules/*
 datafusion/wasmtest/pkg/*
 clippy.toml
diff --git a/docs/source/conf.py b/docs/source/conf.py
index 03dcfb5bfa..c8027fc71b 100644
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@@ -109,6 +109,10 @@ html_context = {
 # so a file named "default.css" will overwrite the builtin "default.css".
 html_static_path = ["_static"]
 
+# Copy agent-facing files (llms.txt) verbatim to the site root so they
+# resolve at the conventional URL `https://datafusion.apache.org/llms.txt`.
+html_extra_path = ["llms.txt"]
+
 html_logo = "_static/images/2x_bgwhite_original.png"
 
 html_css_files = ["theme_overrides.css"]
diff --git a/docs/source/llms.txt b/docs/source/llms.txt
new file mode 100644
index 0000000000..5d738107c8
--- /dev/null
+++ b/docs/source/llms.txt
@@ -0,0 +1,26 @@
+# Apache DataFusion
+
+> Apache DataFusion is an extensible query engine written in Rust that uses 
Apache Arrow as its in-memory format. This file is a directory of agent-facing 
entry points for the DataFusion ecosystem — the Rust core query engine and its 
subprojects. Subproject `llms.txt` files contain the project-specific guidance 
for writing code against each one.
+
+## Core DataFusion (Rust)
+
+- [User guide](https://datafusion.apache.org/user-guide/introduction.html): 
install, example usage, SQL, DataFrame, expressions, configuration, explain 
plans.
+- [Library user 
guide](https://datafusion.apache.org/library-user-guide/index.html): embedding 
DataFusion, extending SQL, custom table providers, building logical plans, the 
query optimizer.
+- [Contributor 
guide](https://datafusion.apache.org/contributor-guide/index.html): development 
environment, architecture, testing, release management, governance.
+- [Rust API docs (`docs.rs`)](https://docs.rs/datafusion/latest/datafusion/): 
generated reference for the `datafusion` crate.
+- [GitHub repository](https://github.com/apache/datafusion): source, issues, 
pull requests.
+
+## Subprojects
+
+Each subproject may expose its own `llms.txt` at `<docs root>/llms.txt` — 
agents following the [llmstxt.org](https://llmstxt.org) convention can probe 
these paths for project-specific guidance.
+
+- [DataFusion Python](https://datafusion.apache.org/python/): Python bindings 
— SQL and lazy DataFrame API over Apache Arrow.
+- [DataFusion Ballista](https://datafusion.apache.org/ballista/): distributed 
execution extension for DataFusion.
+- [DataFusion Comet](https://datafusion.apache.org/comet/): Apache Spark 
accelerator built on DataFusion.
+
+## Optional
+
+- [Blog](https://datafusion.apache.org/blog/): release notes and ecosystem 
updates.
+- [crates.io `datafusion`](https://crates.io/crates/datafusion): published 
crate.
+- [Code of 
conduct](https://github.com/apache/datafusion/blob/main/CODE_OF_CONDUCT.md)
+- [Apache Software Foundation](https://apache.org)


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to