On Sun, Jan 27, 2013 at 10:43 AM, Ivan Raikov <ivan.g.rai...@gmail.com>wrote:

>
> Hi Alex,
>
>     Yes, I would have thought that more people would be interested in
> having UTF-8 support in core Chicken (or at least wide-char compatible
> srfi-14). I have changed the title of this thread to reflect the subject
> more accurately :-)
>
>   Personally, I think that adding UTF-8  in core is much better than the
> hacks I had to do in mbox, and is a no brainer considering the benchmark
> results you have below.  But I am sure that opinions vary on this subject...
>
>    Can you post your bounds-check patches to srfi-14 on the mailing list,
> and/or create a ticket for it? Hopefully there will be more responses this
> time.
>

Well, I'm not necessarily proposing UTF-8 support in the core.
I understand that has pros and cons and opinions may differ.

I was just pointing out that we're already got 3 char-set
implementations, 2 of them in the core distribution, and
there are no real cons to simplifying this and replacing
srfi-14 with one of the Unicode-capable implementations.

The simplest change I made was replacing:

(define-inline (si=0? s i) (zero? (%char->latin1 (string-ref s i))))
(define-inline (si=1? s i) (not (si=0? s i)))

with:

(define-inline (si=0? s i) (if (>= i 256) #t (zero? (%char->latin1
(string-ref s i)))))
(define-inline (si=1? s i) (and (< i 256) (eq? 1 (%char->latin1 (string-ref
s i)))))

which is actually faster and while it doesn't support
wide char-sets, at least gives the correct answers when
passed wide chars.

-- 
Alex


>     Ivan
>
> On Sat, Jan 26, 2013 at 1:42 PM, Alex Shinn <alexsh...@gmail.com> wrote:
>
>> On Wed, Jan 23, 2013 at 5:09 PM, Alex Shinn <alexsh...@gmail.com> wrote:
>>
>>> On Wed, Jan 23, 2013 at 3:45 PM, Ivan Raikov <ivan.g.rai...@gmail.com>wrote:
>>>
>>>> Yes, I ran into this when I was adding UTF-8 support to mbox... If you
>>>> were to add wide char support in srfi-14, is there a way to quantify the
>>>> performance penalty?
>>>>
>>>
>>> To add the bounds check so it doesn't error?  Practically
>>> nothing.
>>>
>>> To branch to a separate path for a wide-char table if
>>> the bounds check fails?  Same cost if the input is ASCII.
>>>
>>> For efficient handling in the case of Unicode input...
>>> how small/fast do you want it?
>>>
>>
>> I've never met such stony silence in response to an offer to do work...
>>
>> I ran the following simple char-set-contains? benchmark with
>> a few variations:
>>
>>   (time
>>    (do ((i 0 (+ i 1)))
>>        ((= i 10000))
>>        (do ((j 0 (+ j 1)))
>>            ((= j 256))
>>          (char-set-contains? char-set:letter (integer->char j)))))
>>
>> This is what most people are concerned about for speed, as
>> the boolean and construction operations are less common.
>>
>> The results:
>>
>> ;; reference implementation
>> ;; 0.312s CPU time, 1/2059 GCs (major/minor)
>>
>> ;; "fixed" reference implementation (no error but no support for
>> non-latin-1)
>> ;; 0.257s CPU time, 1/1706 GCs (major/minor)
>>
>> ;; utf8-srfi-14 with full Unicode char-set:letter
>> ;; 0.243s CPU time, 0/1526 GCs (major/minor)
>>
>> ;; utf8-srfi-14 with ASCII-only char-set:letter
>> ;; 0.242s CPU time, 0/1526 GCs (major/minor)
>>
>> I was able to add the check and make the reference
>> implementation faster because I fixed the common case -
>> it was optimized for checking for 0 instead of 1.
>>
>> Even with the enormous and complex definition of a
>> Unicode "letter", utf8-srfi-14 is faster than srfi-14.
>>
>> As for what we want in Chicken, the answer depends
>> on what you're optimizing for.  utf8-srfi-14 will always
>> win for space, and generally for speed as well.
>>
>> If the biggest concern is code-size, then you might want
>> to borrow the char-set definition from irregex and use
>> that as a "fallback" for non-latin-1 chars in the srfi-14
>> reference impl.  This would have the same perf as
>> srfi-14 for latin-1, yet still support full Unicode and not
>> increase the size of the Chicken distribution.
>>
>> --
>> Alex
>>
>>
>
_______________________________________________
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users

Reply via email to