On Tue, Jan 15, 2013 at 02:44:08PM +0900, Alex Shinn wrote:
> This result looks broken.  As I noted in my previous mail, the URI
> representation already handles non-ASCII characters and escapes on output:
> 
> $ csi -R uri-common
> #;1> (make-uri scheme: "http" host: "127.0.0.1" path: '(/ "삼계탕"))
> #<URI-common: scheme="http" port=#f host="127.0.0.1" path=(/ "삼계탕")
> query=#f fragment=#f>
> #;2> (uri->string (make-uri scheme: "http" host: "127.0.0.1" path: '(/
> "삼계탕")))
> "http://127.0.0.1/82%BCB3%8483%95";
> 
> Unrelated, the actual escaped output looks buggy - it looks like
> some characters like the leading "%EC%" are getting dropped.

OK, I took some time to investigate and I pinpointed this problem.
This appears to happen due to the use of core srfi-14 and srfi-13 in
uri-generic; its char-set operations simply don't deal with anything
beyond ASCII.  Only by switching to the UTF versions utf8-srfi-14,
utf8-srfi-13 and unicode-char-sets this works:

Without patch:
$ csi -R uri-generic -P '(uri-encode-string "삼계탕")'
"�%82%BC�%B3%84�%83%95"

With patch:
$ csi -R uri-generic -P '(uri-encode-string "삼계탕")'
"%EC%82%BC%EA%B3%84%ED%83%95"

Ivan, what do you think about adding the UTF8 dependency, as per the
attached patch (against trunk)?

Cheers,
Peter
-- 
http://sjamaan.ath.cx
Index: uri-generic.scm
===================================================================
--- uri-generic.scm     (revision 28113)
+++ uri-generic.scm     (working copy)
@@ -57,13 +57,9 @@
 
 (import chicken scheme extras data-structures ports)
  
-(require-extension matchable defstruct srfi-1 srfi-4 srfi-13 srfi-14)
+(require-extension matchable defstruct srfi-1 srfi-4
+                   utf8-srfi-13 utf8-srfi-14 unicode-char-sets)
 
-;; What to do with these?
-#;(cond-expand
-   (utf8-strings (use utf8-srfi-13 utf8-srfi-14))
-   (else (use srfi-13 srfi-14)))
-
 (defstruct URI      scheme authority path query fragment)
 (defstruct URIAuth  username password host port)
 
Index: uri-generic.meta
===================================================================
--- uri-generic.meta    (revision 28113)
+++ uri-generic.meta    (working copy)
@@ -17,7 +17,7 @@
 
  ; A list of eggs uri-generic depends on.
 
- (needs matchable defstruct)
+ (needs matchable defstruct utf8)
  (test-depends test)
 
  (author "Ivan Raikov and Peter Bex")
Index: tests/run.scm
===================================================================
--- tests/run.scm       (revision 28113)
+++ tests/run.scm       (working copy)
@@ -201,7 +201,10 @@
   '(("foo?bar" "foo%3Fbar")
     ("foo&bar" "foo%26bar")
     ("foo%20bar" "foo%2520bar")
-    ("foo\x00bar\n" "foo%00bar%0A")))
+    ("foo\x00bar\n" "foo%00bar%0A")
+    ;; Non-ASCII (Unicode) characters should also be pct-encoded
+    ;; (reported by Sungjin Chun)
+    ("삼계탕" "%EC%82%BC%EA%B3%84%ED%83%95")))
 
 (test-group "uri-encode-string test"
   (for-each (lambda (p)
@@ -588,4 +591,4 @@
 
 (test-end "uri-generic")
 
-(unless (zero? (test-failure-count)) (exit 1))
\ No newline at end of file
+(unless (zero? (test-failure-count)) (exit 1))
_______________________________________________
Chicken-users mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/chicken-users

Reply via email to