-0 But it has to be:

1. Opt-in by default to not trigger security guys about new unplanned
activity after regular upgrade.
2. It should not affect anything if access to the internet is restricted
which is default for many companies.
3. Be transparent about what it will send to for review. Separate log file
is good enough.

--
,,,^..^,,,

On Sat, Mar 30, 2024 at 3:18 AM Kaxil Naik <kaxiln...@apache.org> wrote:

> Hi all,
>
> I want to propose gathering telemetry for Airflow installations. As the
> Airflow community, we have been relying heavily on the yearly Airflow
> Survey and anecdotes to answer a few key questions about Airflow usage.
> Questions like the following:
>
>
>    - Which versions of Airflow are people installing/using now (i.e.
>    whether people have primarily made the jump from version X to version Y)
>    - Which DB is used as the Metadata DB and which version e.g Pg 14?
>    - What Python version is being used?
>    - Which Executor is being used?
>    - Approximately how many people out there in the world are installing
>    Airflow
>
>
> There is a solution that should help answer these questions: Scarf [1]. The
> ASF already approves Scarf [2][3] and is already used by other ASF
> projects: Superset [4], Dolphin Scheduler [5], Dubbo Kubernetes, DevLake,
> Skywalking as it follows GDPR and other regulations.
>
> Similar to Superset, we probably can use it as follows:
>
>
>    1. Install the `scarf js` npm package and bundle it in the Webserver.
>    When the package is downloaded & Airflow webserver is opened, metadata
> is
>    recorded to the Scarf dashboard.
>    2. Utilize the Scarf Gateway [6], which we can use in front of docker
>    containers. While it’s possible people go around this gateway, we can
>    probably configure and encourage most traffic to go through these
> gateways.
>
> While Scarf does not store any personally identifying information from SDK
> telemetry data, it does send various bits of IP-derived information as
> outlined here [7]. This data should be made as transparent as possible by
> granting dashboard access to the Airflow PMC and any other relevant means
> of sharing/surfacing it that we encounter (Town Hall, Slack, Newsletter
> etc).
>
> The following case studies are worth reading:
>
>    1. https://about.scarf.sh/post/scarf-case-study-apache-superset (From
>    Maxime)
>    2.
>
> https://about.scarf.sh/post/haskell-org-bridging-the-gap-between-language-innovation-and-community-understanding
>
> Similar to them, this could help in various ways that come with using data
> for decision-making. With clear guidelines on "how to opt-out" [8][9][10] &
> "what data is being collected" on the Airflow website, this can be
> beneficial to the entire community as we would be making more informed
> decisions.
>
> Regards,
> Kaxil
>
>
> [1] https://about.scarf.sh/
> [2] https://privacy.apache.org/policies/privacy-policy-public.html
> [3] https://privacy.apache.org/faq/committers.html
> [4] https://github.com/apache/superset/issues/25639
> [5]
>
> https://github.com/search?q=repo%3Aapache%2Fdolphinscheduler%20scarf.sh&type=code
> [6] https://about.scarf.sh/scarf-gateway
> [7] https://about.scarf.sh/privacy-policy
> [8]
>
> https://superset.apache.org/docs/frequently-asked-questions/#does-superset-collect-any-telemetry-data
> [9]
>
> https://superset.apache.org/docs/installation/installing-superset-using-docker-compose
> [10]
>
> https://docs.scarf.sh/package-analytics/#as-a-user-of-a-package-using-scarf-js-how-can-i-opt-out-of-analytics
>

Reply via email to