On Tue, Jan 15, 2013 at 02:44:08PM +0900, Alex Shinn wrote:
> This result looks broken. As I noted in my previous mail, the URI
> representation already handles non-ASCII characters and escapes on output:
>
> $ csi -R uri-common
> #;1> (make-uri scheme: "http" host: "127.0.0.1" path: '(/ "삼계탕"))
> #<URI-common: scheme="http" port=#f host="127.0.0.1" path=(/ "삼계탕")
> query=#f fragment=#f>
> #;2> (uri->string (make-uri scheme: "http" host: "127.0.0.1" path: '(/
> "삼계탕")))
> "http://127.0.0.1/82%BCB3%8483%95"
>
> Unrelated, the actual escaped output looks buggy - it looks like
> some characters like the leading "%EC%" are getting dropped.
OK, I took some time to investigate and I pinpointed this problem.
This appears to happen due to the use of core srfi-14 and srfi-13 in
uri-generic; its char-set operations simply don't deal with anything
beyond ASCII. Only by switching to the UTF versions utf8-srfi-14,
utf8-srfi-13 and unicode-char-sets this works:
Without patch:
$ csi -R uri-generic -P '(uri-encode-string "삼계탕")'
"�%82%BC�%B3%84�%83%95"
With patch:
$ csi -R uri-generic -P '(uri-encode-string "삼계탕")'
"%EC%82%BC%EA%B3%84%ED%83%95"
Ivan, what do you think about adding the UTF8 dependency, as per the
attached patch (against trunk)?
Cheers,
Peter
--
http://sjamaan.ath.cx
Index: uri-generic.scm
===================================================================
--- uri-generic.scm (revision 28113)
+++ uri-generic.scm (working copy)
@@ -57,13 +57,9 @@
(import chicken scheme extras data-structures ports)
-(require-extension matchable defstruct srfi-1 srfi-4 srfi-13 srfi-14)
+(require-extension matchable defstruct srfi-1 srfi-4
+ utf8-srfi-13 utf8-srfi-14 unicode-char-sets)
-;; What to do with these?
-#;(cond-expand
- (utf8-strings (use utf8-srfi-13 utf8-srfi-14))
- (else (use srfi-13 srfi-14)))
-
(defstruct URI scheme authority path query fragment)
(defstruct URIAuth username password host port)
Index: uri-generic.meta
===================================================================
--- uri-generic.meta (revision 28113)
+++ uri-generic.meta (working copy)
@@ -17,7 +17,7 @@
; A list of eggs uri-generic depends on.
- (needs matchable defstruct)
+ (needs matchable defstruct utf8)
(test-depends test)
(author "Ivan Raikov and Peter Bex")
Index: tests/run.scm
===================================================================
--- tests/run.scm (revision 28113)
+++ tests/run.scm (working copy)
@@ -201,7 +201,10 @@
'(("foo?bar" "foo%3Fbar")
("foo&bar" "foo%26bar")
("foo%20bar" "foo%2520bar")
- ("foo\x00bar\n" "foo%00bar%0A")))
+ ("foo\x00bar\n" "foo%00bar%0A")
+ ;; Non-ASCII (Unicode) characters should also be pct-encoded
+ ;; (reported by Sungjin Chun)
+ ("삼계탕" "%EC%82%BC%EA%B3%84%ED%83%95")))
(test-group "uri-encode-string test"
(for-each (lambda (p)
@@ -588,4 +591,4 @@
(test-end "uri-generic")
-(unless (zero? (test-failure-count)) (exit 1))
\ No newline at end of file
+(unless (zero? (test-failure-count)) (exit 1))
_______________________________________________
Chicken-users mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/chicken-users