Re: Jurisdiction of incorporation validation issue

Ryan Sleevi via dev-security-policy Fri, 23 Aug 2019 11:39:42 -0700

On Fri, Aug 23, 2019 at 2:00 PM Jeremy Rowley <[email protected]>
wrote:


>
>
>    - Could you highlight a bit more your proposal here? My understanding
>    is that, despite the Handelsregister ("Commercial Register") being
>    available at a country level, it's further subdivided into a list of
>    couunty or region - e.g. the Amtsgericht Herne ("Local Court Herne").
>
>
>
>    - It sounds like you're still preparing to allow for manual/human
>    input, and simply consistency checking. Is there a reason to not use an
>    allowlist-based approach, in which your Registration Agents may only select
>    from an approved list of County/Region/Locality managed by your Compliance
>    Team?
>
>
>
>    - That, of course, still allows for human error. Using the excellent
>    example of the Handelsregister, perhaps you could describe a bit more the
>    flow a Validation Specialist would go through. Are they expected to examine
>    a faxed hardcopy? Or do they go to handelsregister.de and look up via
>    the registration code?
>
>
>
>    - I ask, because it strikes me that this could be an example where a
>    CA could further improve automation. For example, it's not difficult to
>    imagine that a locally-developed extension could know the webpages used for
>    validation of the information, and extract the salient info, when that
>    information is not easily encoded in a URL. For those not familiar,
>    Handelsregister encodes the parameters via form POST, a fairly common
>    approach for these company registers, and thus makes it difficult to store
>    a canonical resource URL for, say, a server-to-server retrieval. This would
>    help you quickly and systematically identify the relevant jurisdiction and
>    court, and in a way that doesn't involve human error.
>
>
>
> I did not know that about Handelsregister. So that’s good info.  Right
> now, the validation staff selects Handelsregister as the source, the system
> retrieves the information, the staff then selects the jurisdiction
> information and enters the registration information. Germany is locked in
> as the country of verification (because Handelsregister is the source), but
> the staff enters the locality/state type information as the system doesn’t
> know which region is correct.
>
>
>
> The idea is that everywhere we can, the process should automatically fill
> in jurisdiction information for the validation staff so no typing is
> required. This is being done in three parts:
>
>    1. Immediate (aka Stop the Hurt): The first step is to put the GeoCode
>    check in place to ensure that no matter what there will be valid
>    non-mispelled information in the certificate. There will still be
>    user-typed information during the phase since this phase is Aug 18 2019.
>    The system will work exactly as it does now except that the JOI information
>    will run through the GeoCode system to verify that yes, this information
>    isn’t wrong. If wrong, the system won’t allow the cert to be approved.  At
>    this point, no new issues should occur, but I won’t be satisfied as its way
>    too manual – and the registration number is still a manual entry. That
>    needs to change.
>    2. Intermediate (aka Neuter the Staff): During this process we plan to
>    eliminate typing of sources. Instead, the sources will be picklists based
>    on jurisdiction. This means that if you select Germany and the company type
>    is an LLC, you get a list of available sources. Fool proof-ish. There’s
>    still a copy/paste or manual entry of the registration number. For those
>    sources that do provide an API, we can tie into the API, retrieve the
>    documentation, and populate the information.  We want to do that as well,
>    provided it doesn’t throw off phase 3. Since the intermediate solution is
>    also a stop-gap to the final solution, we want it to be a substantial
>    improvement but one that doesn’t impede our final destination.
>    3. The refactor (aka Documents r Us): This is still very much being
>    specc’ed but we’re currently thinking we want to evolve the system to a
>    document system. Right now the system works on checklists. For JOI, you
>    enter the JOI part, select a document (or two) that you’ll to verify JOI
>    and then transfer information to the system from the document. The revamp
>    moves it to where you have the document and specify on the document which
>    parts of the document apply to the organization. For example, you specify
>    on the document that a number is a registration number or that a name is an
>    org name, highlighting the info.  With auto-detection of the fields (just
>    based on key words), you end up with a pretty dang automated system. The
>    validation staff is there to review for accuracy and highlight things that
>    might be missed. Hence, no typing or specifying any information. It’s all
>    directly from the source.
>
>
>
> Naming conventions also not approved yet. Since the engineers watch this
> forum, they’ll probably throw things at me when they see the code names.
>
>
>
>    - I'm curious how well that approach generalizes, and/or what
>    challenges may exist. I totally understand that for registries which solely
>    use hard copies, this is a far more difficult task than it needs to be, and
>    thus an element of human review. However, depending on how prevalent the
>    hardcopy vs online copy is, we might be able to pursue automation for more,
>    and thus increase the stringency for the exceptions that do involve
>    physical copies.
>
>
>
> Right now we get the hard copies and turn them into a PDF to store in the
> audit system for review during internal and external audits.  During
> validation, all documentation must be present and reviewed. Using OCR
> better, we can always at least copy and paste information instead of typing
> it.
>

I'm a little nervous about encouraging wide use of OCR. You may recall at
least one CA was bit by an issue in which their OCR system misidentified
letters - https://bugzilla.mozilla.org/show_bug.cgi?id=1311713

That's why I was keen to suggest technical solutions which would verify and
cross-check. My main concern here would be, admittedly, to ensure the
serialNumber itself is reliably entered and detected. Extracting that from
a system, such as you could due via an Extension when looking at, say, the
Handelsregister, is a possible path to reduce both human transcription and
machine-aided transcription issues.

Of course, alternative ways of cross-checking and vetting that data may
exist. Alternatively, it may be that the solution would be to only
allowlist the use of validation sources that made their datasets machine
readable - this would/could address a host of issues in terms of quality.
I'm admittedly not sure the extent to which organizations still rely on
legacy paper trails, and I understand they're still unfortunately common in
some jurisdictions, particularly in the Asia/Pacific region, so it may not
be as viable.


> The process right now is we right a script based on things we can think of
> that might be wrong (abbreviated states, the word “some” in the state
> field, etc). We usually pull a sampling of a couple thousand certs and
> review those to see if we can find anything wrong that can help identify
> other patterns. We’re in the middle of doing that for the JOI issues.  What
> would be WAY better is if we had rule sets for validation information
> (similar to cablint) that checked validation information and how it is
> stored in our system and made these rule sets run on the complete data
> every time we change something in validation. Right now, we build quick and
> dirty checks that run one time when we have an incident. That’s not great
> as it’s a lot of stuff we can’t reuse. What we should do is build something
> (that crossing my fingers we can open source and share) that will be a
> library of checks on validation information. Sure, it’ll take a lot of
> configuration to work with how other CAs store data, but one thing we’ve
> seen problems with is that changes in one system lead to un-expected
> potential non-compliances in others. Having something  that works
> cross-functionally throughout the system helps.
>

Hugely, and this is exactly the kind of stuff I'm excited to see CAs
discussing and potentially sharing. I think there are some opportunities
for incremental improvements here that may be worth looking at, even before
that final stage.

I would argue a source of (some of) these problems is ambiguity that is
left to the CA's discretion. For example, is the state abbreviated or not?
Is the jurisdictional information clear?  Who are the authorized registries
for a jurisdiction that a CA can use?

I can think of some incremental steps here:
- Disclosing exact detailed procedures via CP/CPS
  - An emphasis should be on allowlisting. Anything not on the allowlist
*should* be an exceptional thing.
  - For example, stating DigiCert will always use a State from ISO 3166-2
makes it clear, and also makes it something verifiable (i.e. someone can
implement an automated check)
  - Similarly, enumerating the registries used makes it possible, in many
cases, to automatically check the serialNumber for both format and accuracy
- Modifying the CA/B Forum documents to formalize those processes, by
explicitly removing the ambiguity or CA discretion. DigiCert's done well
here in the past, removing validation methods like 3.2.2.4.1 / 3.2.2.4.5
due to their misuse and danger
- Writing automated tooling to vet/validate

The nice part is that by formalizing the rules, you can benefit a lot from
improved checking that the community may develop, and if it doesn't
materialize, contribute your own to the benefit of the community.


> A better example in some-state. We scanned for values not listed as states
> and cities that have “some”, “any”, “none”, etc. That only finds a limited
> set of the problem, and obviously missed the JOI information (not part of
> the same data set. Going forward, I want a rule set that says, is this a
> state? If so, then check this source to see if it’s a real state. Then
> check this to see if it also exists in the country specified. Then check to
> see if the locality specified exists in the state. Then see if there is a
> red flag from a map that says the org doesn’t exist. (The map check is
> coming – not there yet….) Instead of finding small one off problems people
> report, find them on a global scale with a rule we run every time something
> in the CAB forum, Mozilla policy, or our own system changes.
>

Yes, this is the expectation of all CAs.

As I understand it, following CAs' remediation of Some-State, etc, this is
exactly what members of the community went and did. This is not surprising,
since one of the clearly identified best practices from that discussion was
to look at ISO 3166-1/ISO 3166-2 for such information inconsistency.
SecureTrust, one of the illustrative good reports, did exactly that, and
that's why it's such a perfect example. It's unfortunate that a number of
other CAs didn't, which is why on the incident reports, I've continued to
push them in terms of their evaluation and disclosure.

This is the exact goal of Incident Reports: identifying not just the
incidents, but the systemic issues, devising solutions that can work, and
making sure to holistically remediate the problem.


>
>    - You describe it as "validation rule" changes - and I'm not sure if
>    you're talking about the BRs (i.e. "we validated this org at time X") or
>    something else. I'm not sure whether you're adding additional data, or
>    formalizing checks on existing data. More details here could definitely
>    help try and generalize it, and might be able to formalize it as a best
>    practice. Alternatively, even if we can't formalize it as a requirement, it
>    may be able to use as the basis when evaluating potential impact or cost of
>    changes (to policy or the BRs) in the future. That is, "any CA that has
>    implemented (system you describe) should be able to provide quantifiable
>    data about the impact of (proposed change X). If CAs cannot do so (because
>    they did not implement the change), their feedback and concerns will not be
>    considered."
>
>
>
> Validation rule meaning our own system, the CAB forum, mozilla policy.
> Basically, anything that could call into question the integrity of some
> data piece within our system. The point is to catch all changes that may
> happen proactively, not just when someone pings me with a problem.  The
> requirement I think we’re trying to meet is “never have the same problem
> again, even if a rule changes” because the system will take that one
> problem, log it as a unit test, and run that unit test ever time we change
> the internal rule set to detect all data that violates that rule as
> modified.  Illustrative example: Assume we decide we want all states
> abbreviated.  Note this would contradict the rule in the EV guidelines that
> requires JOI states to be written out. Right now, this contradiction could
> pass undetected by a lot of CA systems I think. However, if you have a rule
> set that can be enforced globally across the entire data set, you end up
> instantly detecting that no valid EV cert could ever issue. Danger! Anyway,
> the value of this is pretty huge internally IMO. And for compliance, it’ll
> make our job easier. No more 3% audits trying to catch mistakes.
>

Yes, this is the goal, and I'm glad to hear some CAs are recognizing this.
_______________________________________________
dev-security-policy mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-security-policy

Re: Jurisdiction of incorporation validation issue

Reply via email to