Hi! Timothy Sample <samp...@ngyro.com> skribis:
> I also played around with it and came up with > > https://ngyro.com/pog-reports/2022-01-16/missing-sources.json > > This is a “sources.json” that only lists the “missing” and “unknown” > sources from the PoG report. It lists sources across all commits (since > 1.0.0). This might be the easiest thing for SWH to handle, since it > omits nearly 20k sources that they definitely already have. Since they > don’t have the tarball hashes, they have no way to skip downloading and > processing tarballs that they already have by hash. Hence, filtering it > with the extra data we have through the PoG projects should be something > that they welcome! > > If they want, they could point a loader task at > > https://ngyro.com/pog-reports/latest/missing-sources.json > > and I could publish updates when I publish new PoG reports. Sounds great! Could you ask them whether they could do that? It may be that it’s going to be a one-time thing, if everything goes well. > There’s one other thing to think about. Some of our sources are > arguably unsuitable for SWH. For instance, our bootstrap binaries. I > bet we have a bunch of other borderline things, too, like game assets. > Of course, if they are indiscriminately ingesting Github, I’m sure > they’ve loaded plenty of garbage. Mostly, I think about these things > because I believe it’s important to maintain the Guix-SWH relationship. Right, it would be nice to filter them out somehow, even though it’s a drop in the ocean of binaries that SWH ingests routinely (for instance that’s the reason why we find some tarballs, as is, via ‘lookup-content’). Thanks, Ludo’.