On Tue, Jan 15, 2013 at 2:23 PM, Ivan Raikov <[email protected]>wrote:
> Hi again, > > I have now extended the utf8 code in uri-generic, so that UTF-8 > sequences are percent-encoded as lists of the form '(% h1 h2 [% h3 h4 > ...])). The percent-decoding routine is not going to decode sequences of > more that one byte, so that now percent encoding normalization will not > interfere with encoded UTF-8 sequences. I have also renamed the iri->uri > routine to utf8-string->uri. I think now its behavior is compliant with > both RFC 3986 and 3987: > > (utf8-string->uri "http://example.com/삼계탕") => > > #(URI scheme=http authority=#(URIAuth host="example.com" port=#f) path=(/ > "%EC%82%BC%EA%B3%84%ED%83%95") query=#f fragment=#f) > This result looks broken. As I noted in my previous mail, the URI representation already handles non-ASCII characters and escapes on output: $ csi -R uri-common #;1> (make-uri scheme: "http" host: "127.0.0.1" path: '(/ "삼계탕")) #<URI-common: scheme="http" port=#f host="127.0.0.1" path=(/ "삼계탕") query=#f fragment=#f> #;2> (uri->string (make-uri scheme: "http" host: "127.0.0.1" path: '(/ "삼계탕"))) "http://127.0.0.1/82%BCB3%8483%95" If you put percent escapes _inside_ the internal path representation, you'll get double escaping. Parsing is a separate matter, and utf8-string->uri should return the URI object without error, but with the unescaped values in the path and query as resulting from the make-uri above. Unrelated, the actual escaped output looks buggy - it looks like some characters like the leading "%EC%" are getting dropped. -- Alex
_______________________________________________ Chicken-users mailing list [email protected] https://lists.nongnu.org/mailman/listinfo/chicken-users
