I am personally all for it. I think (and this has been again repeated by
Josh who raised the issue with Scarf before to me - he explained at the
Airflow Summit, that the decision we made with Scarf with opt-out that he
nearly missed, scared him and his management out and almost drove them to
not believe in Open-Source, while after we immediately reacted and removed
it as result of that report - the management got more convinced about
Airflow and Open Source and Apache Software Foundation process and
oversight  - and doubled-down not only on Airflow but on multiple other
Apache Software Foundation software.

We should not break the trust again - which means deliberate introduction,
discussion and very clear and transparent communication with our users why
we are doing it and how.


> KEY QUESTIONS FOR DISCUSSION
>
>    1. Is the proposed data collection scope appropriate? Too much? Too
>    little?
>

I like the scope as is

   2. Should we have the public dashboard ready before or shortly after
>    implementation?
>

Matomo of ASF is public, We can't make it private. In case people are not
aware https://analytics.apache.org/ has it all - all our website traffic
stats are available there.


>    3. Are there additional security or privacy concerns we should address?
>

I think what we discussed and you proposed in the docs is good enough.


>    4. How should we handle enterprise deployments that want to share
>    aggregate insights but can't enable telemetry due to compliance issues?
>

I think we should at most ask them if they can publish some aggregate data
on their own - so that we can combine them or maybe double-check against
"public" stats . I am thinking mostly about operator usage aggregated stats
which I find as most interesting from a product point of view. They could
even do it by having their own internal matomo instance - not exposed to
us, and export selected, aggregated data for it - that is an ask that we
should do especially for our biggest stakeholders - Amazon, Google,
Astronomer, Microsoft, Cloudera, maybe others. We could provide
instructions to them on how to set up Matomo in their network. We have no
mechanism to push them, but we could nicely ask  and publicise information
that we have (or not) some data from those deployments and thank those who
would provide those. That's I think most we can do - we can't force anyone
to provide us the data.


>    5. Does the 10-second installation prompt seem reasonable, or should we
>    handle first-time consent differently?
>

I am good with it. That was my idea.


>    6. Should we offer a "telemetry lite" mode with even less data for
>    privacy-sensitive users?
>

I don't think what it could give us. Depending what "lite" means. I think
the problem is that those who would not like to share the data are mostly
concerned about "how many installations" they have (I am speaking mostly
about our biggest stakeholders). It is perfectly understandable that they
do not want to publish the "absolute numbers". And asking for that is
like opening the business model books. Any "public matomo" reporting will
necessarily expose that. There are mutliple ways the origin of reported
data can be tracked to "where Airflow is installed" - and  even if we do
not, potentially malicious actors who break (potentially) to matomo
instances of ASF could get such information. So from a business point of
view that is a huge risk, and I expect no legal department would ever allow
that.  I think the only thing we can ask for is publish some aggregated
data - but not automatically and not directly from running airflow.


>
> CONTEXT FROM SUMMIT
>
> At the Airflow Summit, I presented on this topic and ran a live survey.
> While the sample was small (22-34 responses) and not representative, the
> feedback showed strong preference for opt-in (81%) and indicated security
> and privacy as equal top concerns. The data scope proposed in the AIP
> aligned well with what attendees said they'd be willing to share.
>
> NEXT STEPS
>
> I'd like to gather feedback on this thread for the next two weeks, then
> call for a vote if there's general consensus on the approach. If there are
> significant concerns, I'll revise the AIP accordingly (or please provide
> comments there).
>
> Looking forward to your thoughts.
>
> Best regards,
> Bolke
>
> --
>
> --
> Bolke de Bruin
> [email protected]
>

Reply via email to