Since it is going to be opt-in and optional anyway, we seem to be fine with
having just partial data.

I assume we have logs of distfiles downloads from Gentoo infrastructure, and
can negotiate access to relevant logs of our mirrors. That constitutes partial
data correlated with users' installation activity, as good as it gets.

If we do have some such data, are we using it in any way for the discussed

If we don't, but could get it, would we be able to use that data for these
purposes? If no, why?

If we can't get the data, why?

As an aside, I think the best known way to ensure the availability of important
things, from user perspective, is to pay for these important things. Of course
I see how this won't fit culturally very well here and that we're not going to
switch to commercial model just for this reason.

