On Mon, Oct 21, 2024 at 08:25:23AM +0000, Peter Gutmann wrote: > Matt Palmer <[email protected]> writes: > >I have concerns around doing so, as the data set is very large, and > >constantly updating. > > Ah, I didn't realise it was that big, I thought it'd be a small collection > that could be turned into a bloom filter.
Rather, it's a *large* collection that could be turned into a bloom filter, instead. :grin: Last time I ran the numbers, from memory I think I calculated I'd need a 32MB filter file to get the false-positive rate down to 0.1%. > If there's that many of them the > data would be interesting, any chance of publishing stats, how many > compromised keys, how many are X.509, how many are SSH, etc? There's roughly 2M keys in the pwnedkeys dataset at present. Splitting by type can *kinda* be done, insofar as I keep track of whether the format of the key I found was PKCS1, PKCS8, OpenSSH, PuTTY, etc, but that's not definitive, since OpenSSH reads other formats of key, and they're all just big numbers anyway, at the end of the day. Publishing live stats is doable, just yet another of those "round tuit" things. > >While I'm sure there are *some* things that can't make arbitrary requests, > >I'm less confident about the "lots" part. > > I'm referring to embedded systems, which have no internet access but end up > seeing keys from who-knows-where. The trick there is two-fold: having the storage to hold the dataset, and managing to somehow maintain a reasonably up-to-date dataset to query -- because new keys get added to the dataset all the time. > That would be another reason to see what's present, although that could also > be handled in the stats without having to publish actual keys/certs, what are > the top identifiers used with non-private keys? That could be applied like a > top-ten bad passwords filter, if you can get people to stay away from the most > commonly-used insecure keys it's at least some progress. It'd be possible to identify keys that are published in a number of different places (I keep track of where keys were found, so I could group key metadata by the SPKI and count how many distinct URLs I find). I'll add it to the round tuit bucket, too. - Matt -- You received this message because you are subscribed to the Google Groups "[email protected]" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/a/mozilla.org/d/msgid/dev-security-policy/241411f9-ff14-4d5a-b1d3-7bf7206cf816%40mtasv.net.
