So these are the numbers I'm getting on my laptop for you new distribution:
: hurricane; time ./test_div
c/t = 99360000/400000000
real 0m4.532s
user 0m4.516s
sys 0m0.012s
: hurricane; time ./test_obsd
c/t = 99360000/400000000
real 0m4.367s
user 0m4.360s
sys 0m0.005s
: hurricane; echo 99360000/400000000 | hoc
0.2484
: hurricane;
The upshot that I'm taking away from all of these numbers is that:
1. In the expected case of not overflowing, the OpenBSD optimization is a
real win over just doing the division.
2. When we have an ~25% chance of overflow, it's pretty much comparable to
just omitting the check.
3. When we have a 90% chance of overflow, it's slower, but not by much:
that's to be expected as we are simply executing more instructions.
4. In all cases, using uint128_t's is a clear win, but requires a
non-standard extension (which, admittedly, pretty much all the compilers we
could care about have). In the expected case, I can't honestly say it's
enough of a win that I think it's worth it.
So I think that the overhead of the branches is more or less insignificant
except in pathological cases, but a win on the expected case, which is
clearly what they were optimizing for. In the pathologically bad case,
here's what I get:
: hurricane; time ./test_obsd
c/t = 399996000/400000000
real 0m4.531s
user 0m4.523s
sys 0m0.005s
: hurricane; time ./test_div
c/t = 399996000/400000000
real 0m4.022s
user 0m4.014s
sys 0m0.005s
: hurricane; echo 399996000/400000000 | hoc
0.99999
: hurricane; echo '(4.531-4.022)/4.022' | hoc
0.12655395325708589
: hurricane;
It's slower, but not enough that I'd remove it because the expected case is
so much worse with it left out.
Truthfully, when I imported this code I didn't even bother to think to
change it; it was correct and I didn't give performance much of a thought
given the context it fit in. I could probably be persuaded to just got with
128-bit arithmetic if it was felt that this needed to be optimized (let's
assume that will be standardized in the next revision of the C standard
anyway), but it's not clear to me that it needs further optimization: the
check for overflow is insignificant compared to the cost of calling the
memory allocator. In other words, it didn't seem worth it to modify what
was already there, and what was there works really well in the case they
were optimizing for.
*shrug* This has been an interesting diversion, but I'm going to duck out
now and get back to fixing ACPI. :-) Thanks for the discussion, though!
- Dan C.
On Tue, Dec 1, 2015 at 6:42 PM, 'Davide Libenzi' via Akaros <
[email protected]> wrote:
> Whoopsie:
>
> - size_t v = rand() & ~UMMAX;
> + size_t v = rand() & (UMMAX - 1);
>
>
> dlibenzi@dlibenzi:/tmp$ time ./test_div
> c = 100396000
>
> real 0m3.372s
> user 0m3.368s
> sys 0m0.000s
>
> dlibenzi@dlibenzi:/tmp$ time ./test_obsd
> c = 100396000
>
> real 0m3.798s
> user 0m3.794s
> sys 0m0.000s
>
> dlibenzi@dlibenzi:/tmp$ time ./test_int128
> c = 100396000
>
> real 0m0.551s
> user 0m0.550s
> sys 0m0.000s
>
>
>
> On Tue, Dec 1, 2015 at 3:39 PM, Davide Libenzi <[email protected]>
> wrote:
>
>> Let's make it really random now:
>>
>> [main_test.c]
>> #include <stddef.h>
>> #include <stdio.h>
>> #include <stdlib.h>
>>
>> #define UMMAX (1UL << (sizeof(size_t) * 4))
>>
>> extern int testovf(size_t a, size_t b);
>> extern size_t base(void);
>>
>> size_t num_values = 100000;
>>
>> size_t qrand(void)
>> {
>> size_t v = rand() & ~UMMAX;
>>
>> if (rand() & 16)
>> v += UMMAX;
>>
>> return v;
>> }
>>
>> int main(void)
>> {
>> size_t i, j, c;
>> size_t *vals = calloc(num_values, sizeof(size_t));
>>
>> for (c = 0; c < num_values; c++) {
>> do {
>> vals[c] = qrand();
>> } while (!vals[c]);
>> }
>> c = 0;
>> for (i = 0; i < 4000; i++) {
>> for (j = base(); j < num_values; j++) {
>> c += testovf(vals[j - 1], vals[j]);
>> }
>> }
>> printf("c = %zd\n", c);
>>
>> return 0;
>> }
>>
>>
>> [test_null.c]
>> #include <stddef.h>
>> #include <stdio.h>
>> #include <stdlib.h>
>>
>> int testovf(size_t a, size_t b)
>> {
>> return 0;
>> }
>>
>> size_t base(void)
>> {
>> return 0;
>> }
>>
>>
>>
>> dlibenzi@dlibenzi:/tmp$ time ./test_div
>> c = 100396000
>>
>> real 0m3.325s
>> user 0m3.321s
>> sys 0m0.000s
>>
>> dlibenzi@dlibenzi:/tmp$ time ./test_obsd
>> c = 100396000
>>
>> real 0m3.822s
>> user 0m3.818s
>> sys 0m0.005s
>>
>> dlibenzi@dlibenzi:/tmp$ time ./test_int128
>> c = 100396000
>>
>> real 0m0.560s
>> user 0m0.559s
>> sys 0m0.000s
>>
>> dlibenzi@dlibenzi:/tmp$ time ./test_null
>> c = 0
>>
>> real 0m0.474s
>> user 0m0.469s
>> sys 0m0.004s
>>
>>
>>
>> On Tue, Dec 1, 2015 at 3:28 PM, Dan Cross <[email protected]> wrote:
>>
>>> On Tue, Dec 1, 2015 at 5:40 PM, 'Davide Libenzi' via Akaros <
>>> [email protected]> wrote:
>>>
>>>> Now try with something that emulates a bit more reality (cold branch),
>>>> in making that front branch be random, instead of scorching hot towards a
>>>> single hint.
>>>>
>>>
>>> Except the numbers you generated are not random; they're greater than
>>> OpenBSD's MUL_NO_OVERFLOW almost 90% of the time on my machine. Thus, you
>>> are overflowing most of the time: this is certainly not the expected case
>>> for which one is trying to optimize. If I replace your 'qrand()' with a
>>> call to 'arc4random()' I get results far closer to what I originally posted
>>> and again within a factor of two of the __int128 solution (I'm glad you
>>> made that unsigned; mine of course had the bug that SIZE_MAX * SIZE_MAX
>>> could cause signed overflow!!).
>>>
>>> But of course, using arc4random() will never overflow since it returns
>>> uint32_t.... If I change your 'qrand()' to: 'return (size_t)arc4random() +
>>> (arc4random() >> 2);' then I'm forcing overflows about 3% of the time
>>> (which would still be an absurdly high rate) and the optimization still
>>> looks pretty good:
>>>
>>> : tempest; time ./test_div
>>> c = 13876000
>>>
>>> real 0m3.306s
>>> user 0m3.301s
>>> sys 0m0.003s
>>> : tempest; time ./test_obsd
>>> c = 13448000
>>>
>>> real 0m1.966s
>>> user 0m1.961s
>>> sys 0m0.003s
>>> : tempest;
>>>
>>> This is interesting; I hope you're enjoying this discussion as much as I
>>> am. :-) I don't mean to come across as off-putting or dismissive, and I
>>> hope I'm not doing that.
>>>
>>> - Dan C.
>>>
>>> [main_test.c]
>>>> #include <stddef.h>
>>>> #include <stdio.h>
>>>> #include <stdlib.h>
>>>>
>>>> extern int testovf(size_t a, size_t b);
>>>> extern size_t base(void);
>>>>
>>>> size_t num_values = 100000;
>>>>
>>>> size_t qrand(void)
>>>> {
>>>> return ((size_t) rand() << 4) ^ rand();
>>>> }
>>>>
>>>> int main(void)
>>>> {
>>>> size_t i, j, c;
>>>> size_t *vals = calloc(num_values, sizeof(size_t));
>>>>
>>>> for (c = 0; c < num_values; c++) {
>>>> do {
>>>> vals[c] = qrand();
>>>> } while (!vals[c]);
>>>> }
>>>> c = 0;
>>>> for (i = 0; i < 4000; i++) {
>>>> for (j = base(); j < num_values; j++) {
>>>> c += testovf(vals[j - 1], vals[j]);
>>>> }
>>>> }
>>>> printf("c = %zd\n", c);
>>>>
>>>> return 0;
>>>> }
>>>>
>>>> [test_div.c]
>>>> #include <stddef.h>
>>>> #include <stdio.h>
>>>> #include <stdlib.h>
>>>>
>>>> #define SIZE_MAX (~(size_t) 0)
>>>>
>>>> int testovf(size_t a, size_t b)
>>>> {
>>>> if (a > 0 && SIZE_MAX / a < b)
>>>> return 0;
>>>>
>>>> return 1;
>>>> }
>>>>
>>>> size_t base(void)
>>>> {
>>>> return 0;
>>>> }
>>>>
>>>> real 0m3.438s
>>>> user 0m3.430s
>>>> sys 0m0.004s
>>>>
>>>>
>>>> [test_obsd.c]
>>>> #include <stddef.h>
>>>> #include <stdio.h>
>>>> #include <stdlib.h>
>>>>
>>>> #define SIZE_MAX (~(size_t) 0)
>>>> #define MUL_NO_OVERFLOW (1UL << (sizeof(size_t) * 4))
>>>>
>>>> int testovf(size_t a, size_t b)
>>>> {
>>>> if ((a >= MUL_NO_OVERFLOW || b >= MUL_NO_OVERFLOW) &&
>>>> a > 0 && SIZE_MAX / a < b) {
>>>> return 1;
>>>> }
>>>>
>>>> return 0;
>>>> }
>>>>
>>>> size_t base(void)
>>>> {
>>>> return 0;
>>>> }
>>>>
>>>> real 0m3.875s
>>>> user 0m3.871s
>>>> sys 0m0.000s
>>>>
>>>>
>>>> [test_int128.c]
>>>> #include <stddef.h>
>>>> #include <stdio.h>
>>>> #include <stdlib.h>
>>>>
>>>> #define SIZE_MAX (~(size_t) 0)
>>>>
>>>> int testovf(size_t a, size_t b)
>>>> {
>>>> unsigned __int128 p = (unsigned __int128) a * (unsigned __int128) b;
>>>>
>>>> if (__builtin_expect(p > SIZE_MAX, 0))
>>>> return 1;
>>>>
>>>> return 0;
>>>> }
>>>>
>>>> size_t base(void)
>>>> {
>>>> return 0;
>>>> }
>>>>
>>>> real 0m0.555s
>>>> user 0m0.550s
>>>> sys 0m0.004s
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Dec 1, 2015 at 11:18 AM, Dan Cross <[email protected]> wrote:
>>>>
>>>>> On Tue, Dec 1, 2015 at 11:54 AM, 'Davide Libenzi' via Akaros <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Eh? I didn't *defend* it, I just explained it. That code was a direct
>>>>>>> import from OpenBSD.... I didn't write it, and given what it's doing, I
>>>>>>> saw
>>>>>>> no reason to change it. :-)
>>>>>>>
>>>>>>
>>>>>> You are arguing for simplicity, and yet you want to leave complex and
>>>>>> useless optimizations in place? ☺
>>>>>>
>>>>>
>>>>> Oh, and also: https://www.youtube.com/watch?v=nXaxk27zwlk <-- Well
>>>>> worth the watch.
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Akaros" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Akaros" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Akaros" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Akaros" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>
--
You received this message because you are subscribed to the Google Groups
"Akaros" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
For more options, visit https://groups.google.com/d/optout.