I'm glad to know the answers to these questions! :-)

Jacob S Hoffman-Andrews writes:

> I've been tweaking trivial-validate.py and noticed that we also do
> validation using RELAX NG, and XML schema that allows you to specify
> the format of other XML docs. Is there a distinction between which
> validations we express in RELAX NG vs which ones we express in
> trivial-validate.py? For instance, many of the validations in
> trivial-validate.py can be expressed using RELAX NG's pattern
> facility. It would be a little bit nice to consolidate on one or the
> other to speed up the makexpi process.

trivial-validate.py long predates the use of RELAX NG.  The RELAX NG
idea was a proposal by a list member who did a proof of concept which
I think I then fleshed out a bit.  It's worked well for us.

The schema validation is more elegant and efficient, but the use of the
Python script allows us to express arbitrary constraints and arbitrary
relationships between and within attribute values (e.g. "if value A
satisfies this predicate, then value B elsewhere in the expression must
satisfy this predicate, otherwise we don't care").  I don't think that
XML schema validation is this powerful.  I'm not sure that it is, or is
meant to be, Turing-complete, and I don't believe it allows us to delve
very deeply into the structure of attributes' textual values.

There is definitely some overlap between what the two validation
mechanisms do.  If you can identify particular areas of redundancy, I
would suggest that we prefer the schema validator, because it's easier
to understand what it does, probably runs faster, and is perhaps easier
to be confident that it's correct.  I think we should keep both, but
try to move as much functionality as possible into the schema.

> Also, out of curiosity, why does the set of valid parameters for a
> target host include ä and ö, but not the full range of URL-valid
> Unicode characters?

We weren't sure that we wanted to allow Unicode characters at all,
particularly because of the risk of homoglyph attacks (against us as
ruleset maintainers by contributors, not only against end users).
But one person submitted an apparently correct and useful rule that
uses these two characters, so we added them because the rule that
uses them is helpful and there seems to be no risk of homoglyph
attacks from these visually distinctive characters.

-- 
Seth Schoen  <[email protected]>
Senior Staff Technologist                       https://www.eff.org/
Electronic Frontier Foundation                  https://www.eff.org/join
815 Eddy Street, San Francisco, CA  94109       +1 415 436 9333 x107
_______________________________________________
HTTPS-Everywhere mailing list
[email protected]
https://lists.eff.org/mailman/listinfo/https-everywhere

Reply via email to