kaxil opened a new pull request, #62261:
URL: https://github.com/apache/airflow/pull/62261

   Devlist Discussion: 
https://lists.apache.org/thread/7n4pklzcc4lxtxsy9g69ssffg9qbdyvb
   
   A static-site provider registry for discovering and browsing Airflow 
providers and their modules. Deployed at `airflow.apache.org/registry/` 
alongside the existing docs infrastructure (S3 + CloudFront).
   
   Staging preview:  https://airflow.staged.apache.org/registry/ 
   
   ## What it does
   
   The registry indexes all 99 official providers and 840 modules (operators, 
hooks, sensors, triggers, transfers, bundles, notifiers, secrets backends, log 
handlers, executors) from the existing
   `providers/*/provider.yaml` files and source code in this repo. No external 
data sources beyond PyPI download stats.
   
   **Pages:**
   
   - **Homepage** — search bar (Cmd+K), stats counters, featured and new 
providers
   - **Providers listing** — filterable by lifecycle stage 
(stable/incubation/deprecated), category, and sort order (downloads, name, 
recently updated)
   - **Provider detail** — module counts by type, install command with 
extras/version selection, dependency info, connection builder, and a tabbed 
module browser with category sidebar and per-module search
   - **Explore by Category** — providers grouped into Cloud, Databases, Data 
Warehouses, Messaging, AI/ML, Data Processing, etc.
   - **Statistics** — module type distribution, lifecycle breakdown, top 
providers by downloads and module count
   - **JSON API** — `/api/providers.json`, `/api/modules.json`, per-provider 
endpoints for modules, parameters, and connections
   
   **Connection Builder** — pick a connection type (e.g. `aws`, `redshift`), 
fill in the form fields with placeholders and sensitivity markers, and export 
as URI, JSON, or environment variable format. Fields are
   extracted from provider.yaml connection metadata.
   
   <!-- Upload screenshots here: homepage.png, providers-list.png, 
provider-detail-amazon.png, connection-builder.png, explore-categories.png, 
stats.png, module-browser.png, provider-informatica-incubation.png -->
   
   <!-- Dark mode variants: homepage-dark.png, providers-list-dark.png, etc. -->
   
   ## Architecture
   ```
   provider.yaml + source code (providers/*/)
           │
           ▼
   extract_metadata.py     ← AST-parses Python files, fetches PyPI stats
           │
           ▼
   registry/src/_data/
     ├── providers.json    ← 99 providers with metadata, quality scores
     ├── modules.json      ← 840 modules with import paths, docstrings
     └── search-index.json ← Pagefind custom records
           │
           ▼
   Eleventy build          ← Generates 2,740 static HTML pages
           │
           ▼
   Pagefind postbuild      ← Builds search index from custom records
           │
           ▼
   S3 sync + CloudFront    ← registry-build.yml workflow
   ```
   Four Python extraction scripts run at build time:
   
   | Script | What it does | Runs in |
   |--------|-------------|---------|
   | `extract_metadata.py` | Parses provider.yaml, AST-parses source for class 
names/docstrings, fetches PyPI stats and release dates | CI (host Python) |
   | `extract_versions.py` | Reads older provider versions from git tags | CI 
(host Python) |
   | `extract_parameters.py` | Inspects constructor signatures via runtime 
import | Breeze (needs provider packages installed) |
   | `extract_connections.py` | Extracts connection form fields from 
provider.yaml + hook classes | Breeze (needs provider packages installed) |
   
   The site itself is vanilla HTML/CSS/JS built with 
[Eleventy](https://www.11ty.dev/) — no React, no bundler. Search uses Pagefind 
(client-side, loads lazily on first search interaction). Fonts are self-hosted 
(Plus Jakarta Sans, JetBrains
   Mono).
   
   ## Design decisions worth calling out
   
   **Why AST parsing instead of runtime import?** `extract_metadata.py` runs on 
the CI host without installing 100+ provider packages. It reads `.py` files and 
extracts class names, base classes, and docstrings from
   the AST. This means it works with just `pyyaml` as a dependency. The 
trade-off: it can't resolve dynamic class definitions or runtime-computed 
attributes. For the 99 providers currently in the repo, AST parsing
   captures everything.
   
   **Why four separate scripts?** `extract_parameters.py` and 
`extract_connections.py` need runtime access to provider classes (to inspect 
`__init__` signatures and call `get_connection_form_widgets()`). They run
   inside Breeze where all providers are installed. `extract_metadata.py` and 
`extract_versions.py` only need filesystem access and run on the host. Keeping 
them separate means the CI workflow can run the fast
   scripts (metadata) without spinning up Breeze, while parameter/connection 
extraction is a separate optional step.
   
   **Why Eleventy?** Static site generators produce zero-JS pages by default. 
The registry works without JavaScript — filtering and search are layered on top 
progressively. Eleventy also has no opinion on frontend
   frameworks, which keeps the dependency surface small (the lockfile has ~30 
packages total).
   
   **Path prefix handling:** The site deploys at `/registry/` on 
airflow.apache.org but runs at `/` during local dev. Eleventy's `pathPrefix` 
config handles this via the `REGISTRY_PATH_PREFIX` env var. Templates use
   the `| url` filter, and client-side JS reads `window.__REGISTRY_BASE__` 
(injected in `base.njk`).
   
   **Module filtering:** The extraction script filters classes based on 
type-specific suffix patterns (e.g. `Operator`, `Hook`, `Sensor` suffixes for 
their respective types) and base class inheritance. This avoids
   indexing helper classes, dataclasses, and exceptions that happen to live in 
operator/hook modules.
   
   ## What's NOT included (future work)
   
   
   ## How to test locally
   
   ```bash
   # 1. Extract metadata
   uv run python dev/registry/extract_metadata.py
   
   # 2. Install Node dependencies
   cd registry && pnpm install
   
   # 3. Start dev server at http://localhost:8080
   pnpm dev
   ```
   
    <!-- SPDX-License-Identifier: Apache-2.0
         https://www.apache.org/licenses/LICENSE-2.0 -->
   
   
   <img width="1280" height="800" alt="connection-builder-dark" 
src="https://github.com/user-attachments/assets/7ac3eec0-ce73-483e-b92f-c4c058b48568";
 />
   <img width="1280" height="800" alt="connection-builder" 
src="https://github.com/user-attachments/assets/39d15d12-624c-4cce-86a7-f7d3028a1230";
 />
   
   <img width="1280" height="800" alt="explore-categories-dark" 
src="https://github.com/user-attachments/assets/04500e2d-dc65-4b5c-8869-fb351e5d1a91";
 />
   <img width="1280" height="800" alt="explore-categories" 
src="https://github.com/user-attachments/assets/3c8c10da-6741-41b5-9da3-9eb437ae27c9";
 />
   
   <img width="1280" height="800" alt="homepage-dark" 
src="https://github.com/user-attachments/assets/5043097f-4a15-4df1-9924-96c55ed24266";
 />
   <img width="1280" height="800" alt="homepage" 
src="https://github.com/user-attachments/assets/33cea9e3-b906-4e4d-a26b-9acf2de38272";
 />
   
   
   <img width="1280" height="800" alt="module-browser-dark" 
src="https://github.com/user-attachments/assets/3cbd41b0-dbf4-4456-b823-95ef32fc8a78";
 />
   <img width="1280" height="800" alt="module-browser" 
src="https://github.com/user-attachments/assets/60d78c57-3a86-4658-a697-06d81b880b5b";
 />
   
   <img width="1280" height="800" alt="provider-detail-amazon-dark" 
src="https://github.com/user-attachments/assets/c9beb13c-72de-4520-bcd3-1d30832edfcb";
 />
   <img width="1280" height="800" alt="provider-detail-amazon" 
src="https://github.com/user-attachments/assets/0b9d9a0f-fbc2-4173-b96b-259b7cc8d2b4";
 />
   <img width="1280" height="800" alt="providers-list-dark" 
src="https://github.com/user-attachments/assets/0e8dd3b7-aee1-4604-a97f-8d21429623d3";
 />
   <img width="1280" height="800" alt="providers-list" 
src="https://github.com/user-attachments/assets/46395130-9ce9-4730-a949-97959165da14";
 />
   
   <img width="1280" height="800" alt="stats-dark" 
src="https://github.com/user-attachments/assets/a409f154-cac0-4520-9371-07be1deafe3c";
 />
   <img width="1280" height="800" alt="stats" 
src="https://github.com/user-attachments/assets/068e5667-a121-4fb9-83e7-950c97d814a9";
 />
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to