GitHub user rpofuk created a discussion: GitOps-Native DagBundle Configuration 
for Team Self-Onboarding

## Summary

We need file-based DagBundle configuration with automatic reloading to support 
GitOps workflows and team self-onboarding in multi-tenant Airflow deployments.

## The Problem

Right now, Airflow 3.x DagBundles require JSON configurations embedded directly 
in airflow.cfg or environment variables. This creates several problems for 
teams trying to automate onboarding and follow GitOps principles.

Current approach looks like this:

```ini
[dag_processor]
dag_bundle_config_list = [{"name": "team1", "classpath": "...", "kwargs": 
{...}}, ...]
```

This causes real pain:
- We have to manually convert JSON files to embedded strings
- Every configuration change requires restarting Airflow services
- Automated team onboarding workflows break because of manual steps
- Version control diffs are hard to read when JSON is embedded in INI files
- Not friendly for cloud-native deployments with ConfigMaps

## Use Case: Multi-Team GitOps Onboarding

Our platform serves more than 30 teams with decentralized DAG ownership. When a 
new team requests Airflow access, here's what happens:

1. GitHub workflow automatically generates their configuration
2. Writes dag_bundle_config.json to our Git repository
3. Manual step required: Someone has to convert JSON to a string, update the 
config file, and restart all Airflow services

This manual intervention completely breaks the self-service model we're trying 
to build. Teams can't truly onboard themselves if platform engineers need to 
restart services every time.

## What We're Proposing

Let Airflow load DagBundle configuration from files using the file:// protocol, 
with optional automatic reloading:

```ini
[dag_processor]
dag_bundle_config_list = file:///etc/airflow/config/dag_bundle_config.json
dag_bundle_config_list_watch = True  # Optional: auto-reload on change
```

The configuration file would be standard JSON:

```json
[
  {
    "name": "analytics",
    "classpath": "airflow.providers.git.bundles.git.GitDagBundle",
    "kwargs": {
      "repo_url": "https://github.com/org/team-analytics-dags";,
      "tracking_ref": "main",
      "refresh_interval": 60
    }
  }
]
```

## Why This Helps GitOps Workflows

Before this feature:
- Team onboarding takes 2-4 hours because of manual steps
- Every configuration change causes service interruption
- Platform team has to be heavily involved in every onboarding

With this feature:
- Team onboarding down to 10-15 minutes through automated PR workflow
- Zero downtime for configuration updates
- Platform team just reviews PRs, no manual operations

The GitOps workflow becomes simple:

1. Team opens a PR adding their config to dag_bundle_config.json
2. Automated tests validate the JSON schema
3. Platform team approves the PR
4. Merge triggers the deployment pipeline
5. Airflow automatically reloads the configuration without restart
6. Team's DAGs become available immediately

## Future Extensions

Once file:// is stable and proven, we could extend the same approach to remote 
protocols:
- https://config-server.example.com/dag_bundles.json
- s3://bucket/path/dag_bundles.json
- git://repo/config/dag_bundles.json

But starting with local files keeps things simple and covers most use cases.

We're willing to contribute to the implementation if there's interest from the 
community.


GitHub link: https://github.com/apache/airflow/discussions/60859

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to