On Thu, Jan 17, 2013 at 4:51 AM, Peter Bex <peter....@xs4all.nl> wrote:

> On Tue, Jan 15, 2013 at 02:44:08PM +0900, Alex Shinn wrote:
> > This result looks broken.  As I noted in my previous mail, the URI
> > representation already handles non-ASCII characters and escapes on
> output:
> >
> > $ csi -R uri-common
> > #;1> (make-uri scheme: "http" host: "127.0.0.1" path: '(/ "삼계탕"))
> > #<URI-common: scheme="http" port=#f host="127.0.0.1" path=(/ "삼계탕")
> > query=#f fragment=#f>
> > #;2> (uri->string (make-uri scheme: "http" host: "127.0.0.1" path: '(/
> > "삼계탕")))
> > "http://127.0.0.1/82%BCB3%8483%95";
> >
> > Unrelated, the actual escaped output looks buggy - it looks like
> > some characters like the leading "%EC%" are getting dropped.
>
> OK, I took some time to investigate and I pinpointed this problem.
> This appears to happen due to the use of core srfi-14 and srfi-13 in
> uri-generic; its char-set operations simply don't deal with anything
> beyond ASCII.


As an aside from the uri discussion, we really need to fix srfi-14.

The reference implementation is terrible.  Not only does it not
handle Unicode, but it doesn't not-handle it gracefully:

#;1> (char-set-contains? char-set:full #\x100)
Error: (string-ref) out of range [...]

At a minimum we should avoid these errors, but really we
should be using a Unicode-aware implementation - there's no
barrier to doing so like there is for Unicode strings.  We could
just move utf8-srfi-14 into the core, or I could patch up the
srfi-14 implementation to handle wide chars properly (but maybe
slowly) without bringing in the iset dependency.

-- 
Alex
_______________________________________________
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users

Reply via email to