On Wed, Jan 23, 2013 at 5:09 PM, Alex Shinn <alexsh...@gmail.com> wrote:
> On Wed, Jan 23, 2013 at 3:45 PM, Ivan Raikov <ivan.g.rai...@gmail.com>wrote: > >> Yes, I ran into this when I was adding UTF-8 support to mbox... If you >> were to add wide char support in srfi-14, is there a way to quantify the >> performance penalty? >> > > To add the bounds check so it doesn't error? Practically > nothing. > > To branch to a separate path for a wide-char table if > the bounds check fails? Same cost if the input is ASCII. > > For efficient handling in the case of Unicode input... > how small/fast do you want it? > I've never met such stony silence in response to an offer to do work... I ran the following simple char-set-contains? benchmark with a few variations: (time (do ((i 0 (+ i 1))) ((= i 10000)) (do ((j 0 (+ j 1))) ((= j 256)) (char-set-contains? char-set:letter (integer->char j))))) This is what most people are concerned about for speed, as the boolean and construction operations are less common. The results: ;; reference implementation ;; 0.312s CPU time, 1/2059 GCs (major/minor) ;; "fixed" reference implementation (no error but no support for non-latin-1) ;; 0.257s CPU time, 1/1706 GCs (major/minor) ;; utf8-srfi-14 with full Unicode char-set:letter ;; 0.243s CPU time, 0/1526 GCs (major/minor) ;; utf8-srfi-14 with ASCII-only char-set:letter ;; 0.242s CPU time, 0/1526 GCs (major/minor) I was able to add the check and make the reference implementation faster because I fixed the common case - it was optimized for checking for 0 instead of 1. Even with the enormous and complex definition of a Unicode "letter", utf8-srfi-14 is faster than srfi-14. As for what we want in Chicken, the answer depends on what you're optimizing for. utf8-srfi-14 will always win for space, and generally for speed as well. If the biggest concern is code-size, then you might want to borrow the char-set definition from irregex and use that as a "fallback" for non-latin-1 chars in the srfi-14 reference impl. This would have the same perf as srfi-14 for latin-1, yet still support full Unicode and not increase the size of the Chicken distribution. -- Alex
_______________________________________________ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users