On Tue, Dec 1, 2015 at 11:54 AM, 'Davide Libenzi' via Akaros < [email protected]> wrote:
> On Tue, Dec 1, 2015 at 7:56 AM, Dan Cross <[email protected]> wrote: > >> ...but you were talking about the dates of his *blog* post. His *blog >> post* doesn't mention those macros at all. Rather, he talks about a >> technique and makes a statement of a general principle. Those macros were >> written in the 1980s or early 1990s. >> > > No, everything started from the macros, got steered away to a blog post, > which in order to prove its own point, was making false assumptions about > what other people which is not him, would be doing, and I commented those > points. > Actually, that's not what happened. The benchmark itself is correct; it's >> rather that I had screwed things up so that the only path ever executed was >> the fast path. *Cough* *cough* my bad. >> >> But regardless, the "slow" version is only 4 times slower, and runs in >> something like a little over a nanosecond on my machine. Is that enough >> overhead to argue about? Maybe, but it's not immediately clear. >> > > For reading/writing operands and result in an ioctl-like syscall, it does > not matter. > That was agreed about 15 posts ago 😉 > If you are writing a library function, which you do not know beforehand > how and where it will be used, it does matter, especially when dealing like > APIs of this kind, which could indeed be used in tight high frequency loops. > Well, that's why one should profile. It was mentioned to me that the code you advocated for creates an alias for a non-void pointer that is of a different non-void type is technically undefined behavior. But that would be silly. But that code *is* portable, provided the proper machine description >>> definitions. >>> >> >> Great. Run void *p = 0x110011; uint32_t d = *(uint32_t *)p; on an MC68k >> and tell me what happens. Saying, "no one cares about 68k" doesn't count as >> an answer. :-) >> > > I think all saints day just passed, so we can stop bringing back the > deceased CPUs ☺ > But what about the next CPU that comes along where alignment once again matters? Nobody was thinking about running that code as is, w/out ifdef guards. > That's my point: the code itself *must* be guarded with ifdefs since it's inherently non-portable. Or you could just have -I/$objtype/include and have an, 'endian.h' in >> /$objtype/include that has static-inline functions that do the right thing. >> > > I note you are started to steer away from assembly (remember, this branch > of the discussion born from you posting an assembly solution stating it was > a better deal), but OBJTYPE/endian.h ... is not that simple. > Take Linux for example. You have ARCH (the whole 15 or so of them), and > within each ARCH, you have many CPU model and revisions. > Choices that are good for an Intel P4, might not be good with an Haswell > (dertainly the fast unaligned, but also things which depends on pipeline > length). Let's not even go in the ARM world, where the head can literally > explode. > So you have like 15 ARCHs, each with an AVG of, say, 3 CPU revs., 45 > combos, instead of two variables: LE, FAST_UNALIGNED. > Yes, you can use symlinks, and/or other makefile generated machinery, but > you still have to deal with it. > Yes, but you do so in a much more controlled way. Eh? I didn't *defend* it, I just explained it. That code was a direct >> import from OpenBSD.... I didn't write it, and given what it's doing, I saw >> no reason to change it. :-) >> > > You are arguing for simplicity, and yet you want to leave complex and > useless optimizations in place? ☺ > : tempest; cat benchdrv.c #include <stddef.h> #include <stdio.h> #include <stdlib.h> extern int testovf(size_t a, size_t b); volatile size_t aa; int main() { size_t c; aa = 10000201; c = 0; for (int i = 0; i < 1000000000; i++) { c += testovf(aa, i); asm volatile("" ::: "memory"); } printf("c = %zd\n", c); return 0; } : tempest; cat fast.c #include <stddef.h> #include <stdio.h> #include <stdlib.h> int testovf(size_t a, size_t b) { if (a > 0 && SIZE_MAX / a < b) { return 1; } return 0; } : tempest; gcc-mp-4.9 -Ofast -std=c11 -fasm -c benchdrv.c : tempest; cat fast.c #include <stddef.h> #include <stdio.h> #include <stdlib.h> int testovf(size_t a, size_t b) { if (a > 0 && SIZE_MAX / a < b) { return 1; } return 0; } : tempest; gcc-mp-4.9 -Ofast -std=c11 -fasm -c fast.c : tempest; gcc-mp-4.9 -Ofast -std=c11 -fasm -o fast fast.o benchdrv.o : tempest; time ./fast c = 0 real 0m8.122s user 0m8.104s sys 0m0.006s : tempest; cat slow.c #include <stddef.h> #include <stdio.h> #include <stdlib.h> #define MUL_NO_OVERFLOW (1UL << (sizeof(size_t) * 4)) int testovf(size_t a, size_t b) { if ((a >= MUL_NO_OVERFLOW || b >= MUL_NO_OVERFLOW) && a > 0 && SIZE_MAX / a < b) { return 1; } return 0; } : tempest; gcc-mp-4.9 -Ofast -std=c11 -fasm -c slow.c : tempest; gcc-mp-4.9 -Ofast -std=c11 -fasm -o slow slow.o benchdrv.o : tempest; time ./slow c = 0 real 0m2.638s user 0m2.624s sys 0m0.002s : tempest; Looks like they're not so useless after all. I note that 128-bit arithmetic isn't defined in C11, so is non-portable. But for giggles, I used the gcc type to test it. It's faster, but not so much so that I could justify to myself using a non-standard extension to the language for it: : tempest; cat fastest.c #include <stddef.h> #include <stdio.h> #include <stdlib.h> #define MUL_NO_OVERFLOW (1UL << (sizeof(size_t) * 4)) int testovf(size_t a, size_t b) { __int128_t p = (__int128_t)a * (__int128_t)b; if (p > SIZE_MAX) { return 1; } return 0; } : tempest; gcc-mp-4.9 -Ofast -std=c11 -fasm -c fastest.c : tempest; gcc-mp-4.9 -Ofast -std=c11 -fasm -o fastest fastest.o benchdrv.o : tempest; time ./fastest c = 0 real 0m1.856s user 0m1.840s sys 0m0.003s : tempest; Static assertions don't really help either (not that one would expect them to): : tempest; cat faster.c #include <stddef.h> #include <stdio.h> #include <stdlib.h> int testovf(size_t a, size_t b) { if (__builtin_expect((a > 0 && SIZE_MAX / a < b), 0)) { return 1; } return 0; } : tempest; gcc-mp-4.9 -Ofast -std=c11 -fasm -c faster.c : tempest; gcc-mp-4.9 -Ofast -std=c11 -fasm -o faster faster.o benchdrv.o : tempest; time ./faster c = 0 real 0m7.651s user 0m7.632s sys 0m0.005s : tempest; -- You received this message because you are subscribed to the Google Groups "Akaros" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. For more options, visit https://groups.google.com/d/optout.
