Oops, the second example should have been

For the string "삼계탕" the octets are EC 82 BC EA B3 84 ED 83  95 and
(utf8-string->uri "http://example.com/삼계탕";) produces

#(URI scheme=http authority=#(URIAuth host="example.com" port=#f) path=(/
"%EC%82%BC%EA%B3%84%ED%83%95") query=#f fragment=#f)

Sorry about the confusion.

  Ivan




On Tue, Jan 15, 2013 at 3:03 PM, Ivan Raikov <[email protected]>wrote:

>
> Hi Alex,
>
>     I understand your point about make-uri, but I want to provide a uri
> constructor that takes a UTF-8 input string and maps it in accordance with
> RFC 3986 / 3987.
> So we still have to perform path and percent-encoding normalization steps
> for the ASCII portions of the string. make-uri makes no such attempts at
> normalization and so does not strictly follow RFC 3986.
> I interpreted Section 3.1 from RFC 3987 to mean that UTF-8 are encoded by
> taking each octet and applying percent encoding on it.
>
> So for the string "пиле" the octets are D0 BF D0 B8 D0 BB D0 B5 and
> (utf8-string->uri "http://example.com/пиле";) produces
>
> #(URI scheme=http authority=#(URIAuth host="example.com" port=#f) path=(/
> "%D0%BF%D0%B8%D0%BB%D0%B5") query=#f fragment=#f)
>
> For the string "삼계탕" the octets are EC 82 BC EA B3 84 ED 83  95 and
> (utf8-string->uri "http://example.com/삼계탕";) produces
>
> #(URI scheme=http authority=#(URIAuth host="example.com" port=#f) path=(/
> "%D0%BF%D0%B8%D0%BB%D0%B5") query=#f fragment=#f)
>
>
> Can you elaborate what is broken about this? Perhaps I do not understand
> UTF-8 and need to apply a bitmask or something to the octets?
>
> Percent-encoded sequences of more than one octet will not get touched by
> pct-decode in the current implementation, so you will not get double
> escaping. Percent-encoded sequences of one octet will get decoded if they
> fall in the "unstructured" char-set, as per RFC 3986.
>
>   Ivan
>
>
>
>> This result looks broken.  As I noted in my previous mail, the URI
>> representation
>> already handles non-ASCII characters and escapes on output:
>>
>> $ csi -R uri-common
>> #;1> (make-uri scheme: "http" host: "127.0.0.1" path: '(/ "삼계탕"))
>> #<URI-common: scheme="http" port=#f host="127.0.0.1" path=(/ "삼계탕")
>> query=#f fragment=#f>
>> #;2> (uri->string (make-uri scheme: "http" host: "127.0.0.1" path: '(/
>> "삼계탕")))
>> "http://127.0.0.1/82%BCB3%8483%95";
>>
>> If you put percent escapes _inside_ the internal path representation,
>> you'll get double escaping.
>>
>> Parsing is a separate matter, and utf8-string->uri should return
>> the URI object without error, but with the unescaped values in
>> the path and query as resulting from the make-uri above.
>>
>> Unrelated, the actual escaped output looks buggy - it looks like
>> some characters like the leading "%EC%" are getting dropped.
>>
>> --
>> Alex
>>
>> #(URI scheme=http authority=#(URIAuth host="example.com" port=#f)
> path=(/ "%EC%82%BC%EA%B3%84%ED%83%95") query=#f fragment=#f)
>
>
_______________________________________________
Chicken-users mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/chicken-users

Reply via email to