Have this at the end of the email too: but if folks don't read until the
end and quoting Maxime from the use-case blog[1]:

"I think people often ask ‘how do I contribute to open source?’, ‘I've got
to get into the code’, or ‘ I’ve got to be an engineer.’ Actually, the very
simplest thing that you can do is just say, ‘my organization gets real
value from this piece of software.’ There are a bunch of ways to let the
people know about it – and now Scarf is there. If your organization is
getting a lot of value from a piece of open source software, make sure the
devs know about it."

What kind of edge cases are you thinking about? I don't think it makes
sense to have "opt-in" at all. As the goal is to collect data for most
Airflow installations except for those that don't want to give data, then
"opt-out" is the only way to maximize it. As long as we don't collect any
PII data, this is in-compliance as well.

Imagine someone learning Airflow, if they have to opt-in via a config, they
wouldn't even know or care about it, hence us losing most of the data. I
understand why some orgs & individuals may want to opt-out.

Scarf Provides tracking pixels (essentially an HTML image tag) that you can
place in your website or product to track visitors to that URL. If there
were any concerns about Privacy, ASF wouldn't have approved it at all.

A few key details to note about the pixel:


   - No PII is tracked… Scarf does not capture/retain IP information… this
   information is discarded by the platform upon processing/aggregating
   - Scarf pixels respect the Do Not Track (DNT) settings of browsers -
   these users will not be tracked whatsoever.


All the ASF projects I had listed (whether they use Scarf gateway or Scarf
pixel in product) are using opt-out.

1. Short opt-in period before opt-out. Test this feature with users who
> trust and if it works great - make it public. I think it's wise to handle
> edge cases and configure collected data more accurately.



It would be a pixel in the webserver, should affect nothing at all even in
an air-gapped environment.

> 2. It should not affect anything if access to the internet is restricted
> which is default for many companies.



100% agreed on the below:

> I think we have a very good blueprint to follow including at least 5 other
> ASF projects that also passed the review of the privacy@asf. And while I
> understand (and concur) the urge for opt-in by default coming from consumer
> market (where it makes perfect sense) Airflow is not a consumer
> software and is used in "corporate environment" which has a little
> different expectations and broad assumption that the company can make
> decisions on such telemetry on behalf of the employees using it.


Couldn't agree more; even though there shouldn't we collect hamper security
(and we should aim to do the same), most security concerned folks don't
just
upgrade, and we can rely on them regarding release notes or announcements
and we can make it very clear in our announcements too; and in our
installation guides.

We should assume that those who deploy and upgrade Airflow - actually read
> and take into account what is written in the release notes - especially if
> they have security guys breathing their necks, similarly as we have to
> assume they follow CVE announcements about security issues fixed. If we
> are very straightforward and out-going about the change, inform very
> clearly how to opt-out, I don't see a big problem with opt-out.



To be clear, the collection of data, or at least the data we should gather
here should help all the consumers without violating anything regulations.
I will quote Maxime's quote in the use-case doc [1]

"*Another Form of Contributing*
“I think people often ask ‘how do I contribute to open source?’, ‘I've got
to get into the code’, or ‘ I’ve got to be an engineer.’ Actually, the very
simplest thing that you can do is just say, ‘my organization gets real
value from this piece of software.’ There are a bunch of ways to let the
people know about it – and now Scarf is there. If your organization is
getting a lot of value from a piece of open source software, make sure the
devs know about it.”"


[1] https://about.scarf.sh/post/scarf-case-study-apache-superset

On Sat, 30 Mar 2024 at 14:02, Alexander Shorin <kxe...@apache.org> wrote:

> Hi Jarek!
>
> I understand the reasons for opt-out from a project view. I just suddenly
> imagined the situation when an upgrade happens and here comes the data to
> some third party service - that's a view from a user side of some big
> company.
>
> There could be good alternatives to handle this:
> 1. Short opt-in period before opt-out. Test this feature with users who
> trust and if it works great - make it public. I think it's wise to handle
> edge cases and configure collected data more accurately.
> 2. Explicitly somehow warn about this feature to make this feature not get
> unnoticed. Just to reduce possible frustration.
>
> Just a personal thoughts for discussion (:
>
> --
> ,,,^..^,,,
>
> On Sat, Mar 30, 2024 at 4:36 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>
> > Hello everyone,
> >
> > it has to be:
> >
> > 1. Opt-in by default to not trigger security guys about new unplanned
> > > activity after regular upgrade.
> > >
> >
> > That's a very good point about security triggering Alexander, but I am
> not
> > so sure it means that we "have to" do opt-in. There are other ways of
> > communicating with the "deployment managers" who install and upgrade
> > airflow - i.e. release notes. blogs, social media of ours, slack
> > announcements etc. We have plenty of channels we can use to communicate
> the
> > change.
> >
> > I think we have a very good blueprint to follow including at least 5
> other
> > ASF projects that also passed the review of the privacy@asf. And while I
> > understand (and concur) the urge for opt-in by default coming from
> consumer
> > market (where it makes perfect sense) Airflow is not a consumer
> > software and is used in "corporate environment" which has a little
> > different expectations and broad assumption that the company can make
> > decisions on such telemetry on behalf of the employees using it.
> >
> > We should assume that those who deploy and upgrade Airflow - actually
> read
> > and take into account what is written in the release notes - especially
> if
> > they have security guys breathing their necks, similarly as we have to
> > assume they follow CVE announcements about security issues fixed. If we
> > are very straightforward and out-going about the change, inform very
> > clearly how to opt-out, I don't see a big problem with opt-out.
> >
> > We should of course check with privacy@a.o (but I'v spend a good deal of
> > time reading the Superset  and other use case and explanation in detail
> to
> > make a better informed decision) - and it looks like they also went
> opt-out
> > way and got cleared by privacy@a.o.  And if we cannot reach consensus,
> we
> > should - as usual - make a voting decision on it (because yes, it is an
> > important decision), but - after reading and understanding why others
> also
> > did it - for me personally, opt-out is a good path.
> >
> > Also because it will rather increase the amount of data to gather, and in
> > our case - counter intuitively - it will be even better for privacy and
> > corporate anonymity, because the more data we get, the more difficult it
> > will be to get any non-statistical/non-aggregated insight from it.
> Imagine
> > if only a few corporate users will enable it consciously - then we will
> be
> > able to draw much more conclusions if we find out who they are, than if
> > everyone has it enabled by default.
> >
> > That's my take on it - but again, it's up to us to vote, for me opt-in is
> > not "has to", and I am rather for opt-out.
> >
> > J.
> >
> > > Hi all,
> > >
> > >
> > > > I want to propose gathering telemetry for Airflow installations. As
> the
> > > > Airflow community, we have been relying heavily on the yearly Airflow
> > > > Survey and anecdotes to answer a few key questions about Airflow
> usage.
> > > > Questions like the following:
> > > >
> > > >
> > > >    - Which versions of Airflow are people installing/using now (i.e.
> > > >    whether people have primarily made the jump from version X to
> > version
> > > Y)
> > > >    - Which DB is used as the Metadata DB and which version e.g Pg 14?
> > > >    - What Python version is being used?
> > > >    - Which Executor is being used?
> > > >    - Approximately how many people out there in the world are
> > installing
> > > >    Airflow
> > > >
> > > >
> > > > There is a solution that should help answer these questions: Scarf
> [1].
> > > The
> > > > ASF already approves Scarf [2][3] and is already used by other ASF
> > > > projects: Superset [4], Dolphin Scheduler [5], Dubbo Kubernetes,
> > DevLake,
> > > > Skywalking as it follows GDPR and other regulations.
> > > >
> > > > Similar to Superset, we probably can use it as follows:
> > > >
> > > >
> > > >    1. Install the `scarf js` npm package and bundle it in the
> > Webserver.
> > > >    When the package is downloaded & Airflow webserver is opened,
> > metadata
> > > > is
> > > >    recorded to the Scarf dashboard.
> > > >    2. Utilize the Scarf Gateway [6], which we can use in front of
> > docker
> > > >    containers. While it’s possible people go around this gateway, we
> > can
> > > >    probably configure and encourage most traffic to go through these
> > > > gateways.
> > > >
> > > > While Scarf does not store any personally identifying information
> from
> > > SDK
> > > > telemetry data, it does send various bits of IP-derived information
> as
> > > > outlined here [7]. This data should be made as transparent as
> possible
> > by
> > > > granting dashboard access to the Airflow PMC and any other relevant
> > means
> > > > of sharing/surfacing it that we encounter (Town Hall, Slack,
> Newsletter
> > > > etc).
> > > >
> > > > The following case studies are worth reading:
> > > >
> > > >    1. https://about.scarf.sh/post/scarf-case-study-apache-superset
> > (From
> > > >    Maxime)
> > > >    2.
> > > >
> > > >
> > >
> >
> https://about.scarf.sh/post/haskell-org-bridging-the-gap-between-language-innovation-and-community-understanding
> > > >
> > > > Similar to them, this could help in various ways that come with using
> > > data
> > > > for decision-making. With clear guidelines on "how to opt-out"
> > > [8][9][10] &
> > > > "what data is being collected" on the Airflow website, this can be
> > > > beneficial to the entire community as we would be making more
> informed
> > > > decisions.
> > > >
> > > > Regards,
> > > > Kaxil
> > > >
> > > >
> > > > [1] https://about.scarf.sh/
> > > > [2] https://privacy.apache.org/policies/privacy-policy-public.html
> > > > [3] https://privacy.apache.org/faq/committers.html
> > > > [4] https://github.com/apache/superset/issues/25639
> > > > [5]
> > > >
> > > >
> > >
> >
> https://github.com/search?q=repo%3Aapache%2Fdolphinscheduler%20scarf.sh&type=code
> > > > [6] https://about.scarf.sh/scarf-gateway
> > > > [7] https://about.scarf.sh/privacy-policy
> > > > [8]
> > > >
> > > >
> > >
> >
> https://superset.apache.org/docs/frequently-asked-questions/#does-superset-collect-any-telemetry-data
> > > > [9]
> > > >
> > > >
> > >
> >
> https://superset.apache.org/docs/installation/installing-superset-using-docker-compose
> > > > [10]
> > > >
> > > >
> > >
> >
> https://docs.scarf.sh/package-analytics/#as-a-user-of-a-package-using-scarf-js-how-can-i-opt-out-of-analytics
> > > >
> > >
> >
>

Reply via email to