I'm glad to know the answers to these questions! :-) Jacob S Hoffman-Andrews writes:
> I've been tweaking trivial-validate.py and noticed that we also do > validation using RELAX NG, and XML schema that allows you to specify > the format of other XML docs. Is there a distinction between which > validations we express in RELAX NG vs which ones we express in > trivial-validate.py? For instance, many of the validations in > trivial-validate.py can be expressed using RELAX NG's pattern > facility. It would be a little bit nice to consolidate on one or the > other to speed up the makexpi process. trivial-validate.py long predates the use of RELAX NG. The RELAX NG idea was a proposal by a list member who did a proof of concept which I think I then fleshed out a bit. It's worked well for us. The schema validation is more elegant and efficient, but the use of the Python script allows us to express arbitrary constraints and arbitrary relationships between and within attribute values (e.g. "if value A satisfies this predicate, then value B elsewhere in the expression must satisfy this predicate, otherwise we don't care"). I don't think that XML schema validation is this powerful. I'm not sure that it is, or is meant to be, Turing-complete, and I don't believe it allows us to delve very deeply into the structure of attributes' textual values. There is definitely some overlap between what the two validation mechanisms do. If you can identify particular areas of redundancy, I would suggest that we prefer the schema validator, because it's easier to understand what it does, probably runs faster, and is perhaps easier to be confident that it's correct. I think we should keep both, but try to move as much functionality as possible into the schema. > Also, out of curiosity, why does the set of valid parameters for a > target host include ä and ö, but not the full range of URL-valid > Unicode characters? We weren't sure that we wanted to allow Unicode characters at all, particularly because of the risk of homoglyph attacks (against us as ruleset maintainers by contributors, not only against end users). But one person submitted an apparently correct and useful rule that uses these two characters, so we added them because the rule that uses them is helpful and there seems to be no risk of homoglyph attacks from these visually distinctive characters. -- Seth Schoen <[email protected]> Senior Staff Technologist https://www.eff.org/ Electronic Frontier Foundation https://www.eff.org/join 815 Eddy Street, San Francisco, CA 94109 +1 415 436 9333 x107 _______________________________________________ HTTPS-Everywhere mailing list [email protected] https://lists.eff.org/mailman/listinfo/https-everywhere
