Re: [akaros] perfmon read/write interface

Dan Cross Tue, 01 Dec 2015 11:14:21 -0800

On Tue, Dec 1, 2015 at 11:54 AM, 'Davide Libenzi' via Akaros <
[email protected]> wrote:


> On Tue, Dec 1, 2015 at 7:56 AM, Dan Cross <[email protected]> wrote:
>
>> ...but you were talking about the dates of his *blog* post. His *blog
>> post* doesn't mention those macros at all. Rather, he talks about a
>> technique and makes a statement of a general principle. Those macros were
>> written in the 1980s or early 1990s.
>>
>
> No, everything started from the macros, got steered away to a blog post,
> which in order to prove its own point, was making false assumptions about
> what other people which is not him, would be doing, and I commented those
> points.
>

Actually, that's not what happened. The benchmark itself is correct; it's
>> rather that I had screwed things up so that the only path ever executed was
>> the fast path. *Cough* *cough* my bad.
>>
>> But regardless, the "slow" version is only 4 times slower, and runs in
>> something like a little over a nanosecond on my machine. Is that enough
>> overhead to argue about? Maybe, but it's not immediately clear.
>>
>
> For reading/writing operands and result in an ioctl-like syscall, it does
> not matter.
> That was agreed about 15 posts ago 😉
> If you are writing a library function, which you do not know beforehand
> how and where it will be used, it does matter, especially when dealing like
> APIs of this kind, which could indeed be used in tight high frequency loops.
>

Well, that's why one should profile. It was mentioned to me that the code
you advocated for creates an alias for a non-void pointer that is of a
different non-void type is technically undefined behavior. But that would
be silly.

But that code *is* portable, provided the proper machine description
>>> definitions.
>>>
>>
>> Great. Run void *p = 0x110011; uint32_t d = *(uint32_t *)p; on an MC68k
>> and tell me what happens. Saying, "no one cares about 68k" doesn't count as
>> an answer. :-)
>>
>
> I think all saints day just passed, so we can stop bringing back the
> deceased CPUs ☺
>

But what about the next CPU that comes along where alignment once again
matters?

Nobody was thinking about running that code as is, w/out ifdef guards.
>

That's my point: the code itself *must* be guarded with ifdefs since it's
inherently non-portable.

Or you could just have -I/$objtype/include and have an, 'endian.h' in
>> /$objtype/include that has static-inline functions that do the right thing.
>>
>
> I note you are started to steer away from assembly (remember, this branch
> of the discussion born from you posting an assembly solution stating it was
> a better deal), but OBJTYPE/endian.h ... is not that simple.
> Take Linux for example. You have ARCH (the whole 15 or so of them), and
> within each ARCH, you have many CPU model and revisions.
> Choices that are good for an Intel P4, might not be good with an Haswell
> (dertainly the fast unaligned, but also things which depends on pipeline
> length). Let's not even go in the ARM world, where the head can literally
> explode.
> So you have like 15 ARCHs, each with an AVG of, say, 3 CPU revs., 45
> combos, instead of two variables: LE, FAST_UNALIGNED.
> Yes, you can use symlinks, and/or other makefile generated machinery, but
> you still have to deal with it.
>

Yes, but you do so in a much more controlled way.

Eh? I didn't *defend* it, I just explained it. That code was a direct
>> import from OpenBSD.... I didn't write it, and given what it's doing, I saw
>> no reason to change it. :-)
>>
>
> You are arguing for simplicity, and yet you want to leave complex and
> useless optimizations in place? ☺
>

: tempest; cat benchdrv.c
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>

extern int testovf(size_t a, size_t b);

volatile size_t aa;

int
main()
{
size_t c;

aa = 10000201;
c = 0;
for (int i = 0; i < 1000000000; i++) {
c += testovf(aa, i);
asm volatile("" ::: "memory");
}
printf("c = %zd\n", c);

return 0;
}
: tempest; cat fast.c
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>

int
testovf(size_t a, size_t b)
{
if (a > 0 && SIZE_MAX / a < b) {
return 1;
}

return 0;
}
: tempest; gcc-mp-4.9 -Ofast -std=c11 -fasm -c benchdrv.c
: tempest; cat fast.c
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>

int
testovf(size_t a, size_t b)
{
if (a > 0 && SIZE_MAX / a < b) {
return 1;
}

return 0;
}
: tempest; gcc-mp-4.9 -Ofast -std=c11 -fasm -c fast.c
: tempest; gcc-mp-4.9 -Ofast -std=c11 -fasm -o fast fast.o benchdrv.o
: tempest; time ./fast
c = 0

real 0m8.122s
user 0m8.104s
sys 0m0.006s
: tempest; cat slow.c
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>

#define MUL_NO_OVERFLOW (1UL << (sizeof(size_t) * 4))

int
testovf(size_t a, size_t b)
{
if ((a >= MUL_NO_OVERFLOW || b >= MUL_NO_OVERFLOW) &&
   a > 0 && SIZE_MAX / a < b) {
return 1;
}

return 0;
}
: tempest; gcc-mp-4.9 -Ofast -std=c11 -fasm -c slow.c
: tempest; gcc-mp-4.9 -Ofast -std=c11 -fasm -o slow slow.o benchdrv.o
: tempest; time ./slow
c = 0

real 0m2.638s
user 0m2.624s
sys 0m0.002s
: tempest;

Looks like they're not so useless after all. I note that 128-bit arithmetic
isn't defined in C11, so is non-portable. But for giggles, I used the gcc
type to test it. It's faster, but not so much so that I could justify to
myself using a non-standard extension to the language for it:

: tempest; cat fastest.c
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>

#define MUL_NO_OVERFLOW (1UL << (sizeof(size_t) * 4))

int
testovf(size_t a, size_t b)
{
__int128_t p = (__int128_t)a * (__int128_t)b;

if (p > SIZE_MAX) {
return 1;
}

return 0;
}
: tempest; gcc-mp-4.9 -Ofast -std=c11 -fasm -c fastest.c
: tempest; gcc-mp-4.9 -Ofast -std=c11 -fasm -o fastest fastest.o benchdrv.o
: tempest; time ./fastest
c = 0

real 0m1.856s
user 0m1.840s
sys 0m0.003s
: tempest;

Static assertions don't really help either (not that one would expect them
to):

: tempest; cat faster.c
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>

int
testovf(size_t a, size_t b)
{
if (__builtin_expect((a > 0 && SIZE_MAX / a < b), 0)) {
return 1;
}

return 0;
}
: tempest; gcc-mp-4.9 -Ofast -std=c11 -fasm -c faster.c
: tempest; gcc-mp-4.9 -Ofast -std=c11 -fasm -o faster faster.o benchdrv.o
: tempest; time ./faster
c = 0

real 0m7.651s
user 0m7.632s
sys 0m0.005s
: tempest;

-- 
You received this message because you are subscribed to the Google Groups 
"Akaros" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [akaros] perfmon read/write interface

Reply via email to