On Mon, May 4, 2020 at 10:14 PM Matt Turner <matts...@gentoo.org> wrote:

> On Mon, May 4, 2020 at 5:48 PM Thomas Deutschmann <whi...@gentoo.org>
> wrote:
> >
> > On 2020-04-26 15:46, Kent Fredric wrote:
> > > On Sun, 26 Apr 2020 14:38:54 +0200
> > > Thomas Deutschmann <whi...@gentoo.org> wrote:
> > >
> > >> Let's assume we will get reports that app-misc/foo is only installed
> 20
> > >> times. If you are going to judge based on this data, "Obviously,
> nobody
> > >> is using that package, it's stuck on <whatever>... safe to remove"
> your
> > >> view is biased:
> > >
> > > I see this as more like what bloom filters get you, but in reverse:
> > >
> > > [...]
> > >
> > > - But now, instead of having "we don't know if anybody uses this", you
> > >   *can* have a "we know for sure somebody uses this".
> >
> > But how does that information really help us to decide anything in the
> end?
> >
> > Case A, stats are showing 0 users:
> >
> > Like said, we can't know if this is true or if this package is only used
> > in setups where people don't report stats.
> >
> >
> > Case B, stats are showing x users:
> >
> > Now what? Package from case A could have similar users -- we just don't
> > know. Assume firefox has 1.000 users, chromium has 500 users and vivaldi
> > doesn't show up in stats. How does that help us? Would this allow us to
> > skip publishing GLSAs for vivalid because we assume nobody in Gentoo is
> > using vivaldi? Does it allow Python project to go forward pushing a mask
> > for removal in case vivaldi would depend on Python version, Python
> > project want to get rid of? Would this allow Gentoo PR to make a public
> > statement like "Firefox is the most popular browser in Gentoo, twice as
> > users as chromium"?
> I hate the saying "the perfect is the enemy of the good" but I think
> it applies here.
> You're of course correct that we would not have perfect information.
> But the thing about statistics is that you can still know some things
> based on a sampling of that perfect information.
> I would personally like to have data on whether users of my packages
> have certain USE flags enabled. Knowing that would allow me to decide
> whether its worth the maintenance burden of supporting features that I
> *think* are very rarely used. If instead the data showed me that 50%
> of users had IUSE=xyz enabled, I probably wouldn't consider removing
> it.
> I think your example of potential misuse of data is a bit over dramatic.

Let me present the same point another way.

Today we have no data, so we make an arbitrary decision. It might be right
or wrong; and we may not know until after we decide.
This is traditionally things like "break them and they will come" type of
process. "Mask it, if they complain, I'll unmask it."

In the future, we could have this package data. It may influence decision
making. However I'm not sure from a decision-making standpoint that it is
strictly worse than no data.
The danger (which is what I think Whissi's concern is) is that it could
artificially increase decision certainty.

For example, if I have to decide whether to keep a package, or a flag, or
whatever. I might make an arbitrary decision. I'm aware it's arbitrary, it
might be wrong, and so I'm not super attached to such a decision. I'm not
*certain* about it; but I have to decide one way or the other[0]. Then I
move to a world with package data. Now I'm no longer making an arbitrary
decision; I'm making a decision based on *data*. The *data* tells me my
decision is correct, resulting in a more *certain* decision outcome. I
think this is the fallacy we want to avoid. The data can be informative but
there are significant biases in it that should result in very *little*
certainty added to decision making.

Making decisions based on incomplete data is just life though, so I'm
fairly skeptical of a "we shouldn't collect any data" type of mindset. I'd
be curious to see if we can instill a *culture* component around the use of
data in our development workflows.


[0] There are a bunch of other cultural components here, like different
decision types (1 vs 2) and the ability to make a mistake in public and not
feel bad about it; so I'm aware reality does not reflect this trivial
example. But those are hallmarks of cultural markets I'd like to aim for in
Gentoo, so I would prefer to discuss a world where they exist ;)

Reply via email to