>> I'm a little nervous about encouraging wide use of OCR. You may recall at >> least one CA was bit by an issue in which their OCR system misidentified >> letters - https://bugzilla.mozilla.org/show_bug.cgi?id=1311713
>> That's why I was keen to suggest technical solutions which would verify and >> cross-check. My main concern here would be, admittedly, to ensure the >> serialNumber itself is reliably entered and detected. Extracting that from a >> system, such as you could due via an Extension when looking at, say, the >> Handelsregister, is a possible path to reduce both human transcription and >> machine-aided transcription issues. Right – and the OCR there is just to make the initial assessment. The idea is to still require validation staff to select the appropriate fields. I like the idea of cross-checking. Maybe what we can also do is tie into a non-primary source (Like D&B or something) to confirm the jurisdiction information. We’ll have to evaluate it, but I like the idea of cross-checking against a reliable source that has an API even if we can’t use the source as our primary source for that information. I’ll need to investigate, but it should be possible for most of EU and the US. Less so for the middle east and Asia. >> Of course, alternative ways of cross-checking and vetting that data may >> exist. Alternatively, it may be that the solution would be to only allowlist >> the use of validation sources that made their datasets machine readable - >> this would/could address a host of issues in terms of quality. I'm >> admittedly not sure the extent to which organizations still rely on legacy >> paper trails, and I understand they're still unfortunately common in some >> jurisdictions, particularly in the Asia/Pacific region, so it may not be as >> viable. Yeah – that mean you basically can’t issue in middle east and mot of Asia. Japan would still work. China I’d have to look. Like I said, there could be non-primary sources that could correlate. We’ll spec that out as we get closer and see what we can do for cross-correlation. May be that we can have enough somethings world-wide that you can always confirm registration with a secondary source. The process right now is we right a script based on things we can think of that might be wrong (abbreviated states, the word “some” in the state field, etc). We usually pull a sampling of a couple thousand certs and review those to see if we can find anything wrong that can help identify other patterns. We’re in the middle of doing that for the JOI issues. What would be WAY better is if we had rule sets for validation information (similar to cablint) that checked validation information and how it is stored in our system and made these rule sets run on the complete data every time we change something in validation. Right now, we build quick and dirty checks that run one time when we have an incident. That’s not great as it’s a lot of stuff we can’t reuse. What we should do is build something (that crossing my fingers we can open source and share) that will be a library of checks on validation information. Sure, it’ll take a lot of configuration to work with how other CAs store data, but one thing we’ve seen problems with is that changes in one system lead to un-expected potential non-compliances in others. Having something that works cross-functionally throughout the system helps. * Hugely, and this is exactly the kind of stuff I'm excited to see CAs discussing and potentially sharing. I think there are some opportunities for incremental improvements here that may be worth looking at, even before that final stage. * I would argue a source of (some of) these problems is ambiguity that is left to the CA's discretion. For example, is the state abbreviated or not? Is the jurisdictional information clear? Who are the authorized registries for a jurisdiction that a CA can use? I think that’s definitely true. There’s lots of ambiguities in the EV guidelines. You and I were talking about Incorporating Agencies, which is not really defined as incorporating agencies. Note that CAs can use Incorporating Agencies or Registration Agencies to confirm identity, which is very broad, but there is no indication in the certificate what that means. > I can think of some incremental steps here: > - Disclosing exact detailed procedures via CP/CPS Maybe an addendum to the CPS. Or RPS. I’ll experiment and post something to see what the community thinks. > - An emphasis should be on allowlisting. Anything not on the allowlist > *should* be an exceptional thing. This we actually have internally. Or are you saying across the industry? The allow list internally is something prevetted by compliance and legal. We’re currently (prompted by a certificate problem report) reviewing the entire allowed list to see what’s there and taking anything off that I don’t like. Basically we’re using your suggestion of https://www.gleif.org/en/about-lei/code-lists/gleif-registration-authorities-list plus a couple of lists for banking (like FDIC). > - For example, stating DigiCert will always use a State from ISO 3166-2 > makes it clear, and also makes it something verifiable (i.e. someone can > implement an automated check) Maybe what we’ll do is keep a running list of the checks. We’re finalizing on spelling out all states. No abbreviations. This is something we can specify in our RPS – how it looks for each field. > - Similarly, enumerating the registries used makes it possible, in many > cases, to automatically check the serialNumber for both format and accuracy Checking the registration number for format and accuracy is something I proposed for the new project, but I wasn’t sure how feasible it was considering the wide variation. You end up with a lot of different numbers. I wonder if you could get it to range for formats? That would certainly be doable while adding some layers of protection. >- Modifying the CA/B Forum documents to formalize those processes, by >explicitly removing the ambiguity or CA discretion. DigiCert's done well here >in the past, removing validation methods like 3.2.2.4.1 / 3.2.2.4.5 due to >their misuse and danger One ballot I do want to pass is adding a field for the JOI entity information. This way everyone can see where the registration number originated. Short of a formalized CAB forum list of permitted entities (which is also on the table), this would make it very easy to have a conversation on whether what the registration number means. There’s probably others, but that’s a request that’s been surfacing a few times. > - Writing automated tooling to vet/validate This is where we are going for sure. * The nice part is that by formalizing the rules, you can benefit a lot from improved checking that the community may develop, and if it doesn't materialize, contribute your own to the benefit of the community. A better example in some-state. We scanned for values not listed as states and cities that have “some”, “any”, “none”, etc. That only finds a limited set of the problem, and obviously missed the JOI information (not part of the same data set. Going forward, I want a rule set that says, is this a state? If so, then check this source to see if it’s a real state. Then check this to see if it also exists in the country specified. Then check to see if the locality specified exists in the state. Then see if there is a red flag from a map that says the org doesn’t exist. (The map check is coming – not there yet….) Instead of finding small one off problems people report, find them on a global scale with a rule we run every time something in the CAB forum, Mozilla policy, or our own system changes. >> Yes, this is the expectation of all CAs. >> As I understand it, following CAs' remediation of Some-State, etc, this is >> exactly what members of the community went and did. This is not surprising, >> since one of the clearly identified best practices from that discussion was >> to look at ISO 3166-1/ISO 3166-2 for such information inconsistency. >> SecureTrust, one of the illustrative good reports, did exactly that, and >> that's why it's such a perfect example. It's unfortunate that a number of >> other CAs didn't, which is why on the incident reports, I've continued to >> push them in terms of their evaluation and disclosure. >> This is the exact goal of Incident Reports: identifying not just the >> incidents, but the systemic issues, devising solutions that can work, and >> making sure to holistically remediate the problem. Right, and we did this for the location on our some state issues (on all the data). But that was a one-time scan and reported it to compliance for review. It was a little script we wrote. What I want the system to do is scan for this particular change every time the validation system changes to make sure nothing contradicts this and invalidate all validations that break a rule. _______________________________________________ dev-security-policy mailing list dev-security-policy@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-security-policy