On Sun, 26 Apr 2020 10:52:27 +0200 Michał Górny <mgo...@gentoo.org> wrote:
> Do you have any other idea for spam protection then? What is the realistic risk here for spamming? If the record is well formed, and pertains to known packages, the worst I currently imagine is astroturfing: A single individual attempting to make a package seem more popular than it is. Just generally IME, spamming aims to make a buck somehow, but if there's no fields in the data set that can be used for this, and abuse of existing fields to fill with spam prose get filtered by not correlating to any known possible values, then the entire record is simply invalid, and can be removed on that basis. Conceptually, you could have a report with "dev-foo/plz-sir-halp-me-I-have-money-and-an-a-nigerian-prince::nigeria-prince", but for anybody to see that they'd have to be querying data about the ::nigeria-prince overlay, and that's assuming we even show data about overlays we can't locate. Trolling ::gentoo with packages that don't exist seems easy to eliminate. I don't like that astroturfing could be a thing ... but like, I also don't really care about that happening. For instance, crates.io has per-crate and per-crate-version download statistics. That's super easy to rig, you get lots of spiky noise in infrequently used packages simply due to various automated services fetching things. But at scale, the data still turns out to be quasi-useful, as it allows you to chart adoption and migration... because as soon as a new version gets shipped, if people are using it, then you'll start to see an uptick in reports from the new version. The "change" and "change response" information is very useful, and a very odd target for astroturfing. I for one would be greatly interested in "new perl version shipped, explosion of results due to people upgrading", because then I can gauge roughly how many people managed to upgrade perl without having to join #gentoo and cry about it being broken. (We could also designate a certain UUID flag for use by Gentoo infra, possibly even a UUID-per-host, the results of which were invisible in the public data, but still visible to people with approved perms, because we really do value the ability to know which packages we have to be careful about causing problems in, and where infra is at with upgrading various things before we remove the versions infra is using, whereas currently, working out what infra are currently running requires lots of direct communication)
pgpd35R8sKJD6.pgp
Description: OpenPGP digital signature