Hi Alex,
I understand your point about make-uri, but I want to provide a uri
constructor that takes a UTF-8 input string and maps it in accordance with
RFC 3986 / 3987.
So we still have to perform path and percent-encoding normalization steps
for the ASCII portions of the string. make-uri makes no such attempts at
normalization and so does not strictly follow RFC 3986.
I interpreted Section 3.1 from RFC 3987 to mean that UTF-8 are encoded by
taking each octet and applying percent encoding on it.
So for the string "пиле" the octets are D0 BF D0 B8 D0 BB D0 B5 and
(utf8-string->uri "http://example.com/пиле") produces
#(URI scheme=http authority=#(URIAuth host="example.com" port=#f) path=(/
"%D0%BF%D0%B8%D0%BB%D0%B5") query=#f fragment=#f)
For the string "삼계탕" the octets are EC 82 BC EA B3 84 ED 83 95 and
(utf8-string->uri "http://example.com/삼계탕") produces
#(URI scheme=http authority=#(URIAuth host="example.com" port=#f) path=(/
"%D0%BF%D0%B8%D0%BB%D0%B5") query=#f fragment=#f)
Can you elaborate what is broken about this? Perhaps I do not understand
UTF-8 and need to apply a bitmask or something to the octets?
Percent-encoded sequences of more than one octet will not get touched by
pct-decode in the current implementation, so you will not get double
escaping. Percent-encoded sequences of one octet will get decoded if they
fall in the "unstructured" char-set, as per RFC 3986.
Ivan
> This result looks broken. As I noted in my previous mail, the URI
> representation
> already handles non-ASCII characters and escapes on output:
>
> $ csi -R uri-common
> #;1> (make-uri scheme: "http" host: "127.0.0.1" path: '(/ "삼계탕"))
> #<URI-common: scheme="http" port=#f host="127.0.0.1" path=(/ "삼계탕")
> query=#f fragment=#f>
> #;2> (uri->string (make-uri scheme: "http" host: "127.0.0.1" path: '(/
> "삼계탕")))
> "http://127.0.0.1/82%BCB3%8483%95"
>
> If you put percent escapes _inside_ the internal path representation,
> you'll get double escaping.
>
> Parsing is a separate matter, and utf8-string->uri should return
> the URI object without error, but with the unescaped values in
> the path and query as resulting from the make-uri above.
>
> Unrelated, the actual escaped output looks buggy - it looks like
> some characters like the leading "%EC%" are getting dropped.
>
> --
> Alex
>
> #(URI scheme=http authority=#(URIAuth host="example.com" port=#f) path=(/
"%EC%82%BC%EA%B3%84%ED%83%95") query=#f fragment=#f)
_______________________________________________
Chicken-users mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/chicken-users