Thanks for putting this together Bolke! It will be really great to have some 
real data to use for planning and project management.

I left comments on the Confluence doc, looking forward to the discussions!

Cheers,
Niko

________________________________
From: Bolke de Bruin <[email protected]>
Sent: Friday, October 10, 2025 12:14:44 PM
To: [email protected]
Subject: [EXT] [DISCUSS] AIP-89: Privacy-First Telemetry for Apache Airflow

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.



AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe. Ne 
cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez pas 
confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que le 
contenu ne présente aucun risque.



Hi everyone,

I'd like to open a discussion on AIP-89, which proposes implementing a
privacy-first telemetry system for Airflow.

BACKGROUND

Airflow has had no telemetry capability since Scarf was removed following
community concerns about privacy and transparency. While that removal was
necessary, it has created significant challenges for maintainers who need
to make informed decisions about feature development, deprecations, and
resource allocation.

AIP-89 proposes a completely new approach that learns from past mistakes.

PROPOSAL SUMMARY

The AIP proposes:

   - Opt-in by default telemetry (explicit user consent required)
   - Apache Software Foundation's Matomo instance as the default collection
   endpoint
   - Minimal data collection: versions, operators used, deployment type,
   aggregate counts only
   - No PII collection: no DAG names, task names, credentials, or user
   information
   - IP anonymization: last octet zeroed as soon as possible
   - Full transparency: public dashboard showing all collected data
   - Community governance: all changes to collected data require dev list
   vote
   - Easy control: settings in both config file and admin UI with
   visibility into what was sent

TARGET VERSION

Airflow 3.2.0

LINKS

AIP-89 Draft:
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-89%3A+Privacy-First+Telemetry+for+Apache+Airflow

Summit Survey Results:
https://www.mentimeter.com/app/presentation/alo47agxwrq66v2kyfgewz6t22vswp1f/edit?source=share-modal

KEY QUESTIONS FOR DISCUSSION

   1. Is the proposed data collection scope appropriate? Too much? Too
   little?
   2. Should we have the public dashboard ready before or shortly after
   implementation?
   3. Are there additional security or privacy concerns we should address?
   4. How should we handle enterprise deployments that want to share
   aggregate insights but can't enable telemetry due to compliance issues?
   5. Does the 10-second installation prompt seem reasonable, or should we
   handle first-time consent differently?
   6. Should we offer a "telemetry lite" mode with even less data for
   privacy-sensitive users?

CONTEXT FROM SUMMIT

At the Airflow Summit, I presented on this topic and ran a live survey.
While the sample was small (22-34 responses) and not representative, the
feedback showed strong preference for opt-in (81%) and indicated security
and privacy as equal top concerns. The data scope proposed in the AIP
aligned well with what attendees said they'd be willing to share.

NEXT STEPS

I'd like to gather feedback on this thread for the next two weeks, then
call for a vote if there's general consensus on the approach. If there are
significant concerns, I'll revise the AIP accordingly (or please provide
comments there).

Looking forward to your thoughts.

Best regards,
Bolke

--

--
Bolke de Bruin
[email protected]

Reply via email to