On Tue, Jan 15, 2013 at 07:30:07PM +0900, Alex Shinn wrote: > Right, I'm familiar with the evil standards :) I'm also hoping that we can > have some basic compatibility between Chicken's uri module and Chibi's > (and whatever R7RS WG2 comes up with).
That would be nice indeed. > It seems to me the sane thing to do is represent URIs unencoded > internally, which can be generated directly with make-uri or decoded > on parsing. That cannot be done in general. If you decode something like %2F, that will wreak havoc with path-structured URIs. The same will happen with other types of "special" characters; you need to be able to distinguish between the "special" character as-is and encoded. These special characters are called "reserved" in the BNF. As you can see, the question mark, equals sign and ampersand is in there. For query urlencoded query strings, these *cannot* be decoded, because then you can't distinguish between http://calc.example.com?bool-expr=x%26y%3D and http://calc.example.com?bool-expr=x&y=1 The former should be decoded in uri-common to the alist ((bool-expr . "x&y=1")) and the latter to ((bool-expr . "x") (y . "1")). By fully decoding all reserved characters in uri-generic, you drop important information. All unreserved characters are already fully decoded by uri-generic, but this leaves the extra decoding of things like the ampersand above inside the query string components after form-decoding to be done by uri-common. > The decoding might be schema-specific, although > really the only difference is the space-to-+ and query args encoding. No, the conversion to a friendly alist is specific to uri-common. > I was confused because the uri-generic change Ivan suggests > seems to be putting encoded characters directly in the representation, > whereas uri-common is encoding only on output. I don't understand this either. I'm at work, so maybe it's just due to a lack of complete attention. > [It also looks like the uri-common encoding is broken - why were bytes > getting lost?] Probably because it doesn't correctly deal with UTF-8 in the decoding of URLencoded form data. I'll need a proper test case and some time to look into it. > Finally, regarding parsing I still don't understand why %AB is decoded > into the corresponding octet but %AB%CD is not? Unsure. Cheers, Peter -- http://sjamaan.ath.cx _______________________________________________ Chicken-users mailing list [email protected] https://lists.nongnu.org/mailman/listinfo/chicken-users
