I am personally all for it. I think (and this has been again repeated by Josh who raised the issue with Scarf before to me - he explained at the Airflow Summit, that the decision we made with Scarf with opt-out that he nearly missed, scared him and his management out and almost drove them to not believe in Open-Source, while after we immediately reacted and removed it as result of that report - the management got more convinced about Airflow and Open Source and Apache Software Foundation process and oversight - and doubled-down not only on Airflow but on multiple other Apache Software Foundation software.
We should not break the trust again - which means deliberate introduction, discussion and very clear and transparent communication with our users why we are doing it and how. > KEY QUESTIONS FOR DISCUSSION > > 1. Is the proposed data collection scope appropriate? Too much? Too > little? > I like the scope as is 2. Should we have the public dashboard ready before or shortly after > implementation? > Matomo of ASF is public, We can't make it private. In case people are not aware https://analytics.apache.org/ has it all - all our website traffic stats are available there. > 3. Are there additional security or privacy concerns we should address? > I think what we discussed and you proposed in the docs is good enough. > 4. How should we handle enterprise deployments that want to share > aggregate insights but can't enable telemetry due to compliance issues? > I think we should at most ask them if they can publish some aggregate data on their own - so that we can combine them or maybe double-check against "public" stats . I am thinking mostly about operator usage aggregated stats which I find as most interesting from a product point of view. They could even do it by having their own internal matomo instance - not exposed to us, and export selected, aggregated data for it - that is an ask that we should do especially for our biggest stakeholders - Amazon, Google, Astronomer, Microsoft, Cloudera, maybe others. We could provide instructions to them on how to set up Matomo in their network. We have no mechanism to push them, but we could nicely ask and publicise information that we have (or not) some data from those deployments and thank those who would provide those. That's I think most we can do - we can't force anyone to provide us the data. > 5. Does the 10-second installation prompt seem reasonable, or should we > handle first-time consent differently? > I am good with it. That was my idea. > 6. Should we offer a "telemetry lite" mode with even less data for > privacy-sensitive users? > I don't think what it could give us. Depending what "lite" means. I think the problem is that those who would not like to share the data are mostly concerned about "how many installations" they have (I am speaking mostly about our biggest stakeholders). It is perfectly understandable that they do not want to publish the "absolute numbers". And asking for that is like opening the business model books. Any "public matomo" reporting will necessarily expose that. There are mutliple ways the origin of reported data can be tracked to "where Airflow is installed" - and even if we do not, potentially malicious actors who break (potentially) to matomo instances of ASF could get such information. So from a business point of view that is a huge risk, and I expect no legal department would ever allow that. I think the only thing we can ask for is publish some aggregated data - but not automatically and not directly from running airflow. > > CONTEXT FROM SUMMIT > > At the Airflow Summit, I presented on this topic and ran a live survey. > While the sample was small (22-34 responses) and not representative, the > feedback showed strong preference for opt-in (81%) and indicated security > and privacy as equal top concerns. The data scope proposed in the AIP > aligned well with what attendees said they'd be willing to share. > > NEXT STEPS > > I'd like to gather feedback on this thread for the next two weeks, then > call for a vote if there's general consensus on the approach. If there are > significant concerns, I'll revise the AIP accordingly (or please provide > comments there). > > Looking forward to your thoughts. > > Best regards, > Bolke > > -- > > -- > Bolke de Bruin > [email protected] >
