Re: [akaros] perfmon read/write interface

'Davide Libenzi' via Akaros Wed, 02 Dec 2015 05:42:07 -0800

The purpose of randomizing the branch, is not because it is overflowing.
99.9% of the cases, it does not.
But, unless you run that code in a loop, given the scarcity of the branch
predictor tags, that branch (as well as many along the path) is going to be
cold.
So the expected run time, is not the hot one, but the cold one.
The conditional to fix that code is not "if (slower)", but "if
(buys-nothing-or-worse && is-more-complex)".
Like stated a few times, in that particular case there is no impact either
ways, but still remain a piece of complex code which buys nothing.
But, if a static inline helper is going to be created to check overflows,
than the __int128 solution, given us moving forwards, and the dinosaurs
being extinct ☺, looks like the best.
Not only because it's the fastest, but also because it's the simplest.
It's what you'd do if you did not care about overflowing: "if (a * b > MAX)
...".
If you look at the generated code, there is no branch at all (uses cmovX).




On Tue, Dec 1, 2015 at 7:49 PM, Dan Cross <[email protected]> wrote:

> So these are the numbers I'm getting on my laptop for you new distribution:
>
> : hurricane; time ./test_div
> c/t = 99360000/400000000
>
> real 0m4.532s
> user 0m4.516s
> sys 0m0.012s
> : hurricane; time ./test_obsd
> c/t = 99360000/400000000
>
> real 0m4.367s
> user 0m4.360s
> sys 0m0.005s
> : hurricane; echo 99360000/400000000 | hoc
> 0.2484
> : hurricane;
>
> The upshot that I'm taking away from all of these numbers is that:
>
> 1. In the expected case of not overflowing, the OpenBSD optimization is a
> real win over just doing the division.
> 2. When we have an ~25% chance of overflow, it's pretty much comparable to
> just omitting the check.
> 3. When we have a 90% chance of overflow, it's slower, but not by much:
> that's to be expected as we are simply executing more instructions.
> 4. In all cases, using uint128_t's is a clear win, but requires a
> non-standard extension (which, admittedly, pretty much all the compilers we
> could care about have). In the expected case, I can't honestly say it's
> enough of a win that I think it's worth it.
>
> So I think that the overhead of the branches is more or less insignificant
> except in pathological cases, but a win on the expected case, which is
> clearly what they were optimizing for. In the pathologically bad case,
> here's what I get:
>
> : hurricane; time ./test_obsd
> c/t = 399996000/400000000
>
> real 0m4.531s
> user 0m4.523s
> sys 0m0.005s
> : hurricane; time ./test_div
> c/t = 399996000/400000000
>
> real 0m4.022s
> user 0m4.014s
> sys 0m0.005s
> : hurricane; echo 399996000/400000000 | hoc
> 0.99999
> : hurricane; echo '(4.531-4.022)/4.022' | hoc
> 0.12655395325708589
> : hurricane;
>
> It's slower, but not enough that I'd remove it because the expected case
> is so much worse with it left out.
>
> Truthfully, when I imported this code I didn't even bother to think to
> change it; it was correct and I didn't give performance much of a thought
> given the context it fit in. I could probably be persuaded to just got with
> 128-bit arithmetic if it was felt that this needed to be optimized (let's
> assume that will be standardized in the next revision of the C standard
> anyway), but it's not clear to me that it needs further optimization: the
> check for overflow is insignificant compared to the cost of calling the
> memory allocator. In other words, it didn't seem worth it to modify what
> was already there, and what was there works really well in the case they
> were optimizing for.
>
> *shrug* This has been an interesting diversion, but I'm going to duck out
> now and get back to fixing ACPI. :-) Thanks for the discussion, though!
>
>         - Dan C.
>
>
> On Tue, Dec 1, 2015 at 6:42 PM, 'Davide Libenzi' via Akaros <
> [email protected]> wrote:
>
>> Whoopsie:
>>
>> - size_t v = rand() & ~UMMAX;
>> + size_t v = rand() & (UMMAX - 1);
>>
>>
>> dlibenzi@dlibenzi:/tmp$ time ./test_div
>> c = 100396000
>>
>> real 0m3.372s
>> user 0m3.368s
>> sys 0m0.000s
>>
>> dlibenzi@dlibenzi:/tmp$ time ./test_obsd
>> c = 100396000
>>
>> real 0m3.798s
>> user 0m3.794s
>> sys 0m0.000s
>>
>> dlibenzi@dlibenzi:/tmp$ time ./test_int128
>> c = 100396000
>>
>> real 0m0.551s
>> user 0m0.550s
>> sys 0m0.000s
>>
>>
>>
>> On Tue, Dec 1, 2015 at 3:39 PM, Davide Libenzi <[email protected]>
>> wrote:
>>
>>> Let's make it really random now:
>>>
>>> [main_test.c]
>>> #include <stddef.h>
>>> #include <stdio.h>
>>> #include <stdlib.h>
>>>
>>> #define UMMAX (1UL << (sizeof(size_t) * 4))
>>>
>>> extern int testovf(size_t a, size_t b);
>>> extern size_t base(void);
>>>
>>> size_t num_values = 100000;
>>>
>>> size_t qrand(void)
>>> {
>>>     size_t v = rand() & ~UMMAX;
>>>
>>>     if (rand() & 16)
>>>         v += UMMAX;
>>>
>>>     return v;
>>> }
>>>
>>> int main(void)
>>> {
>>>     size_t i, j, c;
>>>     size_t *vals = calloc(num_values, sizeof(size_t));
>>>
>>>     for (c = 0; c < num_values; c++) {
>>>         do {
>>>             vals[c] = qrand();
>>>         } while (!vals[c]);
>>>     }
>>>     c = 0;
>>>     for (i = 0; i < 4000; i++) {
>>>         for (j = base(); j < num_values; j++) {
>>>             c += testovf(vals[j - 1], vals[j]);
>>>         }
>>>     }
>>>     printf("c = %zd\n", c);
>>>
>>>     return 0;
>>> }
>>>
>>>
>>> [test_null.c]
>>> #include <stddef.h>
>>> #include <stdio.h>
>>> #include <stdlib.h>
>>>
>>> int testovf(size_t a, size_t b)
>>> {
>>>     return 0;
>>> }
>>>
>>> size_t base(void)
>>> {
>>>     return 0;
>>> }
>>>
>>>
>>>
>>> dlibenzi@dlibenzi:/tmp$ time ./test_div
>>> c = 100396000
>>>
>>> real 0m3.325s
>>> user 0m3.321s
>>> sys 0m0.000s
>>>
>>> dlibenzi@dlibenzi:/tmp$ time ./test_obsd
>>> c = 100396000
>>>
>>> real 0m3.822s
>>> user 0m3.818s
>>> sys 0m0.005s
>>>
>>> dlibenzi@dlibenzi:/tmp$ time ./test_int128
>>> c = 100396000
>>>
>>> real 0m0.560s
>>> user 0m0.559s
>>> sys 0m0.000s
>>>
>>> dlibenzi@dlibenzi:/tmp$ time ./test_null
>>> c = 0
>>>
>>> real 0m0.474s
>>> user 0m0.469s
>>> sys 0m0.004s
>>>
>>>
>>>
>>> On Tue, Dec 1, 2015 at 3:28 PM, Dan Cross <[email protected]> wrote:
>>>
>>>> On Tue, Dec 1, 2015 at 5:40 PM, 'Davide Libenzi' via Akaros <
>>>> [email protected]> wrote:
>>>>
>>>>> Now try with something that emulates a bit more reality (cold branch),
>>>>> in making that front branch be random, instead of scorching hot towards a
>>>>> single hint.
>>>>>
>>>>
>>>> Except the numbers you generated are not random; they're greater than
>>>> OpenBSD's MUL_NO_OVERFLOW almost 90% of the time on my machine. Thus, you
>>>> are overflowing most of the time: this is certainly not the expected case
>>>> for which one is trying to optimize. If I replace your 'qrand()' with a
>>>> call to 'arc4random()' I get results far closer to what I originally posted
>>>> and again within a factor of two of the __int128 solution (I'm glad you
>>>> made that unsigned; mine of course had the bug that SIZE_MAX * SIZE_MAX
>>>> could cause signed overflow!!).
>>>>
>>>> But of course, using arc4random() will never overflow since it returns
>>>> uint32_t.... If I change your 'qrand()' to: 'return (size_t)arc4random() +
>>>> (arc4random() >> 2);' then I'm forcing overflows about 3% of the time
>>>> (which would still be an absurdly high rate) and the optimization still
>>>> looks pretty good:
>>>>
>>>> : tempest; time ./test_div
>>>> c = 13876000
>>>>
>>>> real 0m3.306s
>>>> user 0m3.301s
>>>> sys 0m0.003s
>>>> : tempest; time ./test_obsd
>>>> c = 13448000
>>>>
>>>> real 0m1.966s
>>>> user 0m1.961s
>>>> sys 0m0.003s
>>>> : tempest;
>>>>
>>>> This is interesting; I hope you're enjoying this discussion as much as
>>>> I am. :-)  I don't mean to come across as off-putting or dismissive, and I
>>>> hope I'm not doing that.
>>>>
>>>>         - Dan C.
>>>>
>>>> [main_test.c]
>>>>> #include <stddef.h>
>>>>> #include <stdio.h>
>>>>> #include <stdlib.h>
>>>>>
>>>>> extern int testovf(size_t a, size_t b);
>>>>> extern size_t base(void);
>>>>>
>>>>> size_t num_values = 100000;
>>>>>
>>>>> size_t qrand(void)
>>>>> {
>>>>>     return ((size_t) rand() << 4) ^ rand();
>>>>> }
>>>>>
>>>>> int main(void)
>>>>> {
>>>>>     size_t i, j, c;
>>>>>     size_t *vals = calloc(num_values, sizeof(size_t));
>>>>>
>>>>>     for (c = 0; c < num_values; c++) {
>>>>>         do {
>>>>>             vals[c] = qrand();
>>>>>         } while (!vals[c]);
>>>>>     }
>>>>>     c = 0;
>>>>>     for (i = 0; i < 4000; i++) {
>>>>>         for (j = base(); j < num_values; j++) {
>>>>>             c += testovf(vals[j - 1], vals[j]);
>>>>>         }
>>>>>     }
>>>>>     printf("c = %zd\n", c);
>>>>>
>>>>>     return 0;
>>>>> }
>>>>>
>>>>> [test_div.c]
>>>>> #include <stddef.h>
>>>>> #include <stdio.h>
>>>>> #include <stdlib.h>
>>>>>
>>>>> #define SIZE_MAX (~(size_t) 0)
>>>>>
>>>>> int testovf(size_t a, size_t b)
>>>>> {
>>>>>     if (a > 0 && SIZE_MAX / a < b)
>>>>>         return 0;
>>>>>
>>>>>     return 1;
>>>>> }
>>>>>
>>>>> size_t base(void)
>>>>> {
>>>>>     return 0;
>>>>> }
>>>>>
>>>>> real 0m3.438s
>>>>> user 0m3.430s
>>>>> sys 0m0.004s
>>>>>
>>>>>
>>>>> [test_obsd.c]
>>>>> #include <stddef.h>
>>>>> #include <stdio.h>
>>>>> #include <stdlib.h>
>>>>>
>>>>> #define SIZE_MAX (~(size_t) 0)
>>>>> #define MUL_NO_OVERFLOW (1UL << (sizeof(size_t) * 4))
>>>>>
>>>>> int testovf(size_t a, size_t b)
>>>>> {
>>>>>     if ((a >= MUL_NO_OVERFLOW || b >= MUL_NO_OVERFLOW) &&
>>>>>         a > 0 && SIZE_MAX / a < b) {
>>>>>         return 1;
>>>>>     }
>>>>>
>>>>>     return 0;
>>>>> }
>>>>>
>>>>> size_t base(void)
>>>>> {
>>>>>     return 0;
>>>>> }
>>>>>
>>>>> real 0m3.875s
>>>>> user 0m3.871s
>>>>> sys 0m0.000s
>>>>>
>>>>>
>>>>> [test_int128.c]
>>>>> #include <stddef.h>
>>>>> #include <stdio.h>
>>>>> #include <stdlib.h>
>>>>>
>>>>> #define SIZE_MAX (~(size_t) 0)
>>>>>
>>>>> int testovf(size_t a, size_t b)
>>>>> {
>>>>>     unsigned __int128 p = (unsigned __int128) a * (unsigned __int128)
>>>>> b;
>>>>>
>>>>>     if (__builtin_expect(p > SIZE_MAX, 0))
>>>>>         return 1;
>>>>>
>>>>>     return 0;
>>>>> }
>>>>>
>>>>> size_t base(void)
>>>>> {
>>>>>     return 0;
>>>>> }
>>>>>
>>>>> real 0m0.555s
>>>>> user 0m0.550s
>>>>> sys 0m0.004s
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Dec 1, 2015 at 11:18 AM, Dan Cross <[email protected]> wrote:
>>>>>
>>>>>> On Tue, Dec 1, 2015 at 11:54 AM, 'Davide Libenzi' via Akaros <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Eh? I didn't *defend* it, I just explained it. That code was a
>>>>>>>> direct import from OpenBSD.... I didn't write it, and given what it's
>>>>>>>> doing, I saw no reason to change it. :-)
>>>>>>>>
>>>>>>>
>>>>>>> You are arguing for simplicity, and yet you want to leave complex
>>>>>>> and useless optimizations in place? ☺
>>>>>>>
>>>>>>
>>>>>> Oh, and also: https://www.youtube.com/watch?v=nXaxk27zwlk <-- Well
>>>>>> worth the watch.
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "Akaros" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to [email protected].
>>>>>> To post to this group, send email to [email protected].
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Akaros" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Akaros" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Akaros" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Akaros" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Akaros" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [akaros] perfmon read/write interface

Reply via email to