On Wed, Jan 23, 2013 at 3:45 PM, Ivan Raikov <[email protected]>wrote:
> Yes, I ran into this when I was adding UTF-8 support to mbox... If you > were to add wide char support in srfi-14, is there a way to quantify the > performance penalty? > To add the bounds check so it doesn't error? Practically nothing. To branch to a separate path for a wide-char table if the bounds check fails? Same cost if the input is ASCII. For efficient handling in the case of Unicode input... how small/fast do you want it? -- Alex On Wed, Jan 23, 2013 at 3:42 PM, Alex Shinn <[email protected]> wrote: > On Thu, Jan 17, 2013 at 4:51 AM, Peter Bex <[email protected]> wrote: >> >>> On Tue, Jan 15, 2013 at 02:44:08PM +0900, Alex Shinn wrote: >>> > This result looks broken. As I noted in my previous mail, the URI >>> > representation already handles non-ASCII characters and escapes on >>> output: >>> > >>> > $ csi -R uri-common >>> > #;1> (make-uri scheme: "http" host: "127.0.0.1" path: '(/ "삼계탕")) >>> > #<URI-common: scheme="http" port=#f host="127.0.0.1" path=(/ "삼계탕") >>> > query=#f fragment=#f> >>> > #;2> (uri->string (make-uri scheme: "http" host: "127.0.0.1" path: '(/ >>> > "삼계탕"))) >>> > "http://127.0.0.1/82%BCB3%8483%95" >>> > >>> > Unrelated, the actual escaped output looks buggy - it looks like >>> > some characters like the leading "%EC%" are getting dropped. >>> >>> OK, I took some time to investigate and I pinpointed this problem. >>> This appears to happen due to the use of core srfi-14 and srfi-13 in >>> uri-generic; its char-set operations simply don't deal with anything >>> beyond ASCII. >> >> >> As an aside from the uri discussion, we really need to fix srfi-14. >> >> The reference implementation is terrible. Not only does it not >> handle Unicode, but it doesn't not-handle it gracefully: >> >> #;1> (char-set-contains? char-set:full #\x100) >> Error: (string-ref) out of range [...] >> >> At a minimum we should avoid these errors, but really we >> should be using a Unicode-aware implementation - there's no >> barrier to doing so like there is for Unicode strings. We could >> just move utf8-srfi-14 into the core, or I could patch up the >> srfi-14 implementation to handle wide chars properly (but maybe >> slowly) without bringing in the iset dependency. >> >> -- >> Alex >> >> >> _______________________________________________ >> Chicken-users mailing list >> [email protected] >> https://lists.nongnu.org/mailman/listinfo/chicken-users >> >> >
_______________________________________________ Chicken-users mailing list [email protected] https://lists.nongnu.org/mailman/listinfo/chicken-users
