Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

Alex Shinn Mon, 14 Jan 2013 21:44:20 -0800

On Tue, Jan 15, 2013 at 2:23 PM, Ivan Raikov <[email protected]>wrote:


> Hi again,
>
>    I have now extended the utf8 code in uri-generic, so that UTF-8
> sequences are percent-encoded as lists of the form '(% h1 h2 [% h3 h4
> ...])). The percent-decoding routine is not going to decode sequences of
> more that one byte, so that now percent encoding normalization will not
> interfere with encoded UTF-8 sequences. I have also renamed the iri->uri
> routine to utf8-string->uri. I think now its behavior is compliant with
> both RFC 3986 and 3987:
>
> (utf8-string->uri "http://example.com/삼계탕";) =>
>
> #(URI scheme=http authority=#(URIAuth host="example.com" port=#f) path=(/
> "%EC%82%BC%EA%B3%84%ED%83%95") query=#f fragment=#f)
>

This result looks broken.  As I noted in my previous mail, the URI
representation
already handles non-ASCII characters and escapes on output:

$ csi -R uri-common
#;1> (make-uri scheme: "http" host: "127.0.0.1" path: '(/ "삼계탕"))
#<URI-common: scheme="http" port=#f host="127.0.0.1" path=(/ "삼계탕")
query=#f fragment=#f>
#;2> (uri->string (make-uri scheme: "http" host: "127.0.0.1" path: '(/
"삼계탕")))
"http://127.0.0.1/82%BCB3%8483%95";

If you put percent escapes _inside_ the internal path representation,
you'll get double escaping.

Parsing is a separate matter, and utf8-string->uri should return
the URI object without error, but with the unescaped values in
the path and query as resulting from the make-uri above.

Unrelated, the actual escaped output looks buggy - it looks like
some characters like the leading "%EC%" are getting dropped.

-- 
Alex

_______________________________________________
Chicken-users mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/chicken-users

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

Reply via email to