Oops, the second example should have been For the string "삼계탕" the octets are EC 82 BC EA B3 84 ED 83 95 and (utf8-string->uri "http://example.com/삼계탕") produces
#(URI scheme=http authority=#(URIAuth host="example.com" port=#f) path=(/ "%EC%82%BC%EA%B3%84%ED%83%95") query=#f fragment=#f) Sorry about the confusion. Ivan On Tue, Jan 15, 2013 at 3:03 PM, Ivan Raikov <[email protected]>wrote: > > Hi Alex, > > I understand your point about make-uri, but I want to provide a uri > constructor that takes a UTF-8 input string and maps it in accordance with > RFC 3986 / 3987. > So we still have to perform path and percent-encoding normalization steps > for the ASCII portions of the string. make-uri makes no such attempts at > normalization and so does not strictly follow RFC 3986. > I interpreted Section 3.1 from RFC 3987 to mean that UTF-8 are encoded by > taking each octet and applying percent encoding on it. > > So for the string "пиле" the octets are D0 BF D0 B8 D0 BB D0 B5 and > (utf8-string->uri "http://example.com/пиле") produces > > #(URI scheme=http authority=#(URIAuth host="example.com" port=#f) path=(/ > "%D0%BF%D0%B8%D0%BB%D0%B5") query=#f fragment=#f) > > For the string "삼계탕" the octets are EC 82 BC EA B3 84 ED 83 95 and > (utf8-string->uri "http://example.com/삼계탕") produces > > #(URI scheme=http authority=#(URIAuth host="example.com" port=#f) path=(/ > "%D0%BF%D0%B8%D0%BB%D0%B5") query=#f fragment=#f) > > > Can you elaborate what is broken about this? Perhaps I do not understand > UTF-8 and need to apply a bitmask or something to the octets? > > Percent-encoded sequences of more than one octet will not get touched by > pct-decode in the current implementation, so you will not get double > escaping. Percent-encoded sequences of one octet will get decoded if they > fall in the "unstructured" char-set, as per RFC 3986. > > Ivan > > > >> This result looks broken. As I noted in my previous mail, the URI >> representation >> already handles non-ASCII characters and escapes on output: >> >> $ csi -R uri-common >> #;1> (make-uri scheme: "http" host: "127.0.0.1" path: '(/ "삼계탕")) >> #<URI-common: scheme="http" port=#f host="127.0.0.1" path=(/ "삼계탕") >> query=#f fragment=#f> >> #;2> (uri->string (make-uri scheme: "http" host: "127.0.0.1" path: '(/ >> "삼계탕"))) >> "http://127.0.0.1/82%BCB3%8483%95" >> >> If you put percent escapes _inside_ the internal path representation, >> you'll get double escaping. >> >> Parsing is a separate matter, and utf8-string->uri should return >> the URI object without error, but with the unescaped values in >> the path and query as resulting from the make-uri above. >> >> Unrelated, the actual escaped output looks buggy - it looks like >> some characters like the leading "%EC%" are getting dropped. >> >> -- >> Alex >> >> #(URI scheme=http authority=#(URIAuth host="example.com" port=#f) > path=(/ "%EC%82%BC%EA%B3%84%ED%83%95") query=#f fragment=#f) > >
_______________________________________________ Chicken-users mailing list [email protected] https://lists.nongnu.org/mailman/listinfo/chicken-users
