On Mon, May 4, 2020 at 5:48 PM Thomas Deutschmann <whi...@gentoo.org> wrote: > > On 2020-04-26 15:46, Kent Fredric wrote: > > On Sun, 26 Apr 2020 14:38:54 +0200 > > Thomas Deutschmann <whi...@gentoo.org> wrote: > > > >> Let's assume we will get reports that app-misc/foo is only installed 20 > >> times. If you are going to judge based on this data, "Obviously, nobody > >> is using that package, it's stuck on <whatever>... safe to remove" your > >> view is biased: > > > > I see this as more like what bloom filters get you, but in reverse: > > > > [...] > > > > - But now, instead of having "we don't know if anybody uses this", you > > *can* have a "we know for sure somebody uses this". > > But how does that information really help us to decide anything in the end? > > Case A, stats are showing 0 users: > > Like said, we can't know if this is true or if this package is only used > in setups where people don't report stats. > > > Case B, stats are showing x users: > > Now what? Package from case A could have similar users -- we just don't > know. Assume firefox has 1.000 users, chromium has 500 users and vivaldi > doesn't show up in stats. How does that help us? Would this allow us to > skip publishing GLSAs for vivalid because we assume nobody in Gentoo is > using vivaldi? Does it allow Python project to go forward pushing a mask > for removal in case vivaldi would depend on Python version, Python > project want to get rid of? Would this allow Gentoo PR to make a public > statement like "Firefox is the most popular browser in Gentoo, twice as > users as chromium"?
I hate the saying "the perfect is the enemy of the good" but I think it applies here. You're of course correct that we would not have perfect information. But the thing about statistics is that you can still know some things based on a sampling of that perfect information. I would personally like to have data on whether users of my packages have certain USE flags enabled. Knowing that would allow me to decide whether its worth the maintenance burden of supporting features that I *think* are very rarely used. If instead the data showed me that 50% of users had IUSE=xyz enabled, I probably wouldn't consider removing it. I think your example of potential misuse of data is a bit over dramatic.