On 03/21/2014 05:03 AM, Tim Ruehsen wrote: > Maybe you could just open issues (or even better, fork the repo, make your > changes and create pull requests).
i've just pushed some cleanup suggestions here: https://github.com/rockdaboot/libpsl/pull/1 i see you've pulled them already, thanks! i've got three more conceptual issues which warrant discussion, rather than a patch, though. If there's a better place to have this discussion than this mailing list, i'm happy to move to it, please let me know where. psl_is_tld() semantics ---------------------- the way i see it, we know what it means for psl_is_tld() to return "true" -- but "false" could mean either: (A) "this zone is subordinate to a TLD" (as example.com is to com) or (B) "this zone is superior to a TLD" (as uk is to co.uk). Note that "uk" is not a public suffix. libpsl in its current state appears to assume that psl_is_tld("uk") return "true" even though "uk" is not a TLD, and is not a public suffix, and does not meet Ángel's "one domain under which anyone* can register a subdomain" definition. perhaps if we invert the sense of the current test it will match more cleanly. what about: psl_is_private(char* d) so: psl_is_private("uk") → false psl_is_private("example.com") → true psl_is_private("www.example.com") → true psl_is_private("a.b.c.example.com") → true psl_is_private(".") → false psl_is_private("com") → false psl_is_private("co.ar") → false the other API that might be relevant would be something like psl_get_private_zone(char* d), which would return the shortest private zone that contains d. so: psl_get_private_zone("www.example.com") → "example.com" psl_get_private_zone("example.co.uk") → "example.co.uk" psl_get_private_zone("a.b.c.d.example.net") → "example.net" psl_get_private_zone("com") → ERROR psl_get_private_zone("uk") → ERROR (this is the API supplied by regdom-libs, i think) I chose the term "private" in contrast with the "public" from "public suffix list" -- if folks have a better word to use, i'm happy to swap something else in. regdom-libs uses the term "registered", which i think means "placed in the public registry", which is intelligible to me, but maybe only because i've thought about this problem way more than anyone should have to. i don't know how much sense it would make to users of the library. IDNA ---- I hate to bring this up, because it's a nightmare and i have no good answers, but what does this library expect to do about non-ASCII domain names? effective_tld_names.dat contains the limits in unicode, encoded as UTF-8, e.g.: // xn--mgba3a4f16a.ir (<iran>.ir, Persian YEH) ایران.ir should we assume that the input from the user is in a similar form? do we care about locale issues? what about unicode canonicalization? what if the incoming data is in punycode (the xn--* ascii form) already? the GNU folks have done the ugly ugly work for us if we're willing to link to lgpl'ed libraries: https://www.gnu.org/software/libidn/ malformed inputs ---------------- What should the library do with malformed inputs? i'm thinking about super-long strings, strings starting with more than one dot, or with multiple dots adjacent to each other, strings that don't match whatever encoding we're expecting users to send, etc. --dkg
signature.asc
Description: OpenPGP digital signature
