On Thu, Jan 17, 2013 at 4:51 AM, Peter Bex <peter....@xs4all.nl> wrote:
> On Tue, Jan 15, 2013 at 02:44:08PM +0900, Alex Shinn wrote: > > This result looks broken. As I noted in my previous mail, the URI > > representation already handles non-ASCII characters and escapes on > output: > > > > $ csi -R uri-common > > #;1> (make-uri scheme: "http" host: "127.0.0.1" path: '(/ "삼계탕")) > > #<URI-common: scheme="http" port=#f host="127.0.0.1" path=(/ "삼계탕") > > query=#f fragment=#f> > > #;2> (uri->string (make-uri scheme: "http" host: "127.0.0.1" path: '(/ > > "삼계탕"))) > > "http://127.0.0.1/82%BCB3%8483%95" > > > > Unrelated, the actual escaped output looks buggy - it looks like > > some characters like the leading "%EC%" are getting dropped. > > OK, I took some time to investigate and I pinpointed this problem. > This appears to happen due to the use of core srfi-14 and srfi-13 in > uri-generic; its char-set operations simply don't deal with anything > beyond ASCII. As an aside from the uri discussion, we really need to fix srfi-14. The reference implementation is terrible. Not only does it not handle Unicode, but it doesn't not-handle it gracefully: #;1> (char-set-contains? char-set:full #\x100) Error: (string-ref) out of range [...] At a minimum we should avoid these errors, but really we should be using a Unicode-aware implementation - there's no barrier to doing so like there is for Unicode strings. We could just move utf8-srfi-14 into the core, or I could patch up the srfi-14 implementation to handle wide chars properly (but maybe slowly) without bringing in the iset dependency. -- Alex
_______________________________________________ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users