Re: [akaros] perfmon read/write interface

Dan Cross Tue, 01 Dec 2015 07:57:18 -0800

On Tue, Dec 1, 2015 at 9:15 AM, 'Davide Libenzi' via Akaros <
[email protected]> wrote:

> On Tue, Dec 1, 2015 at 4:46 AM, Dan Cross <[email protected]> wrote:
>
>> So in fairness, that code was originally written well before 'static
>>>> inline' was a thing.
>>>>
>>>
>>> They were certainly here in 2012 😉
>>> TUESDAY, APRIL 03, 2012
>>>
>>
>> ...but Rob didn't write any macros in that blog post....
>>
>
> But everything here started from arguing the PBITx/GBITx macros in the
> first place, which were coming from plan9.
>

...but you were talking about the dates of his *blog* post. His *blog post*
doesn't mention those macros at all. Rather, he talks about a technique and
makes a statement of a general principle. Those macros were written in the
1980s or early 1990s.

Let's run it through gcc-4.9.2 (see attached: gcc-mp-4.9 -std=c11 -fasm -Os
>> -S bench.c). Yeah, the assembly is not as pretty, but did you actually
>> measure the elapsed runtime? They appear to be about the same to me:
>>
>> : hurricane; time ./bench abcd fast
>> t 1684234849000000000
>>
>> real 0m0.339s
>> user 0m0.333s
>> sys 0m0.003s
>> : hurricane; time ./bench abcd slow
>> t 1684234849000000000
>>
>> real 0m0.334s
>> user 0m0.328s
>> sys 0m0.003s
>> : hurricane;
>>
>> Further, you're arguing for a technique based on hardware that didn't
>> make this fast until pretty recently (I can't remember when unaligned
>> access became fast in x86). Sure, the object code is a bit bigger (4 words
>> instead of 2 bytes) so it takes up more space in icache, but for something
>> this small, I don't think it matters. Moral: measure, but only when it can
>> be shown that it's important.
>>
>
> Amended.
> You need to check assembly code when benching, as GCC can strip out entire
> code sections if it feels they have no used outcome.
>

Actually, that's not what happened. The benchmark itself is correct; it's
rather that I had screwed things up so that the only path ever executed was
the fast path. *Cough* *cough* my bad.

But regardless, the "slow" version is only 4 times slower, and runs in
something like a little over a nanosecond on my machine. Is that enough
overhead to argue about? Maybe, but it's not immediately clear.

Even in ARM, ARM64 that is (the only thing that matters, eventually, for
>>>>> Akaros), a single load/store is faster than open coding.
>>>>> Unaligned faulting (or sucking) junk is thing of the past. Processors
>>>>> doing that are either dead, or turning around with new silicon versions.
>>>>>
>>>>
>>>> That's a dangerous assumption, and as there's clearly no harm in
>>>> writing it the portable way since I get the same output anyway, I don't see
>>>> a point in making the assumption.
>>>>
>>>
>>> Note that nobody was trying to push anything which wasn't portable. You
>>> came up with the assembly thing.
>>>
>>
>> '*(uint32_t *)p;' isn't portable because of alignment issues (unless you
>> can guarantee that p always points to properly aligned data). Sure, you can
>> wrap that up in an 'ifdef' so that you don't compile it on a system where
>> alignment is important, but the code itself is still inherently unportable.
>> ifdef'ing it out or handwaving away platforms where it matters doesn't
>> really change that. I'd rather just write one version of the code that's
>> portable.
>>
>
> But that code *is* portable, provided the proper machine description
> definitions.
>

Great. Run void *p = 0x110011; uint32_t d = *(uint32_t *)p; on an MC68k and
tell me what happens. Saying, "no one cares about 68k" doesn't count as an
answer. :-)

But I think we're talking past each other here: you're saying, essentially,
that that code is portable if it's wrapped up in ifdef's so that it's not
compiled into a platform that doesn't support fast unaligned access. I'm
saying that the *code itself* is therefore not portable because it depends
on something that's specific to the system it's compiled for. To put it
differently, you seem to be suggesting that the environment the code is
compiled in matters, and I'm trying to examine the code independent of the
environment.

Much more portable than assembly, provided two CPU level configurable being
> in place.
>

You're talking about assembler again. I addressed that specifically in my
last email: use C if you like. But the choice of how to write efficient
code is orthogonal to good style vis-a-vis abuse of ifdef.

Narrowing to what we are dealing here. One we already have (endian), and
> one (fast unaligned) which can be defaulted to 0, so at the worst you fall
> back to the sucky behavior.
>

...or you have a single platform-dependent source file that just does
what's appropriate on that platform.

Both of them, could be even auto-generated by autoconf snippets (endian is
> already for sure), for non-OS software.
>

Or you could just have -I/$objtype/include and have an, 'endian.h' in
/$objtype/include that has static-inline functions that do the right thing.

As for autoconf: https://queue.acm.org/detail.cfm?id=2349257 (...but see my
comments to Kamp in there!)

I think the overarching point of Rob's post was that if a programmer feels
>>>>>> like s/he needs to write something to deal with endianness of the machine
>>>>>> one is on, one's almost certainly going to be wrong.
>>>>>>
>>>>>
>>>>> Really? And who's this guy? Anyone I can recognize here?😀
>>>>>
>>>>
>>>> Rob Pike? No, he's not one of the scientists in that picture (cool
>>>> picture by the way). But he is this guy:
>>>> https://en.wikipedia.org/wiki/The_Unix_Programming_Environment
>>>> https://en.wikipedia.org/wiki/The_Practice_of_Programming
>>>>
>>>
>>> I will always be taking hard shots to the guys which assume that either
>>> "other people will get it wrong", or, along the same lines, "other people
>>> will fail because they failed".
>>>
>>
>> ...but he didn't fail at anything. His point is absolutely correct.
>>
>
> We seem to have different kind heroes. Mine are the ones whom don't talk
> down to people telling them they will fail.
>

Rob Pike has a very peculiar way of talking that makes it appear he's
talking down to people. However, when one meets him and interacts with him
in real life, one finds that he isn't doing that at all (yet he has many of
the same mannerisms). I agree he can come across as off-putting, but I
assure you that's not actually his intention.

He did fail at least in something though. He failed to conceive an OS used
> in more than 10 computers around the globe.
>

I know you're being tongue-in-cheek here, but bear in mind that a) that
wasn't his intent (they were doing research, not trying to build a system
for the masses) and b) Plan 9 was wildly influential, even on Linux. All of
the namespace stuff that they are just now trying to figure out how to
incorporate, /proc, and even 'clone' were directly inspired by Plan 9. So,
given (b) I'd say he was actually wildly successful at (a), which was what
he was trying to do.

Sorry, but you were calling for it 😀
>

Fair enough. :-)

 Yes, like, in linux, 8 of them ☺
>
>>
>> Check out glibc.
>>
>> : chandra; find glibc-2.19 -name '*.[Ss]' | wc -l
>>     2061
>> : chandra;
>>
>
> Yes, GLIBC. It has entire floating point emulation libraries written in
> assembly (A LOT of single function .S files - one per FP insn).
> In Linux (and many other OSs I had my eyes on), assembly is used in boot
> related code and maybe a *few* hot or particular places where writing
> inline assembly would not be practical.
>

But we're talking more generally about the use of assembler for hot
functions, right? In that case, the library actually matters a lot. My
point isn't to favor assembler over C, but rather to show that it *is* used
in lots of places. If you *don't* want to use assembler, then by all means
program in C: that doesn't mean I need lots of #ifdef's to do the job,
though, which was the more general point I was trying to make. Assembler is
not the point: the point is that one doesn't need ifdef.

Certainly you do not see anywhere doing the kind of stuff we are talking
> about here, by makefile machinery orchestrating assembly files.
>

I just don't think it has to be complicated at all. The "Makefile
machinery" can be trivial: just refer to the right directory for the
architecture specific stuff.

If a system has 100 valid combinations, you have to handle those 100
>>> combinations.
>>>
>>
>> Ah, but ifdef's don't just cover the *valid* combinations and that's part
>> of the problem with them. Ifdefs allow you to introduce a tweak-able knob
>> that introduces a decision space much bigger than what's actually needed.
>> If I restrict myself only to boolean expression predicated on the existence
>> or lack thereof of a preprocessor symbol, then I have a number of
>> combinations that's exponential in the number of terms; for anything
>> non-trivial, that gets big fast. But probably only a handful of
>> combinations are actually meaningful. So the set I actually use is much
>> smaller than the decision space I've created. A classic problem with
>> preprocessor magic is what happens when I tweak the knobs to force a
>> decision that isn't handled in the code. This makes things fragile, and
>> really brittle to change.
>>
>
> ...
>
>
>>
>> On the other hand, if I use separate compilation units then I can provide
>> exactly what I support and nothing more.
>>
>> Either you do it with Makefile magic (makefiles, which are driven by
>>> configs themselves - they are just called $(FOO)), or you do it with C
>>> pre-processing magic.
>>>
>>
>> Err, if by makefile magic you mean a directory name in a variable, then I
>> guess so.... I think history has shown again and again that that's much
>> cleaner than using the preprocessor. Plan 9 ran on a dozen architectures
>> without a single #ifdef related to portability.
>>
>
> And yet, most of the software you are using today (certainly Unix based
> ones), is based on auto/manual generated HAVE_FEATURE_X macros, and ifdef
> machinery at C/C++ level.
>

That doesn't mean that it's good practice.

I claim these things have become popular because the software that makes
use of those things is so common, not because the techniques are good.
Similarly, the software is common because it's useful (like Linux), but
that doesn't mean that it's good. My 1985 AMC Eagle was really useful when
I was in high school, but man was it a shitty car. :-D

Software that drops C in exchange of a bunch of assembly files covering the
> different branches of the conditionals, looks pretty rare to me.
>

As I've said before, it doesn't need to be assembly. One can apply the same
technique with C. Please stop fixating on the assembly thing.

It could be that almost everyone else is wrong though. But in this
> particular case, I am with almost everyone else ☺
>

That's fine: what *we're* doing is more engineering, less art. But I think
you are discarding the argument Rob makes for the wrong reasons.

I just would like to understand what you are arguing for.
>
In one email, you are fighting over potential over-optimization, in another
> you defend code with silly ones (see the array alloc).
>

Eh? I didn't *defend* it, I just explained it. That code was a direct
import from OpenBSD.... I didn't write it, and given what it's doing, I saw
no reason to change it. :-)

In one email macros are fine (PBIT/GBIT for example - places, like function
> like pattern, where macros should not be used),
>

Eh? It's not that they're fine, it's rather that they predate the
alternatives. I think that if one were doing it over again today, a static
inline function in a header would *clearly* be the way to go. But that
stuff was written in the last century.

in another email macros are evil (in places, like variable declaration
> expansion - where you can't do w/out using CPP features like # and ##).
>

Eh? Dude, I'm a Lisp programmer, remember? I live on macros. :-)

But *C* macros should be minimized. Sure, in some places you absolutely
need them: # and ## in macros are fine. I never said macros have *no*
place. But not all features of the preprocessor were created equal, and I'm
against #ifdef hell. Sorry, I've been there and it's not a nice place to
live. :-)

And I think that, in general, macros should be as simple as possible to
avoid their more unpleasant side effects.

Trying to understand what I'm arguing for; that's fair. Let me just give a
general statement of principles to try and clarify:

   1. Code should be written more for readability and less for the
   convenience of the person writing it (it will be ready many more times than
   it will be written). Readability means clarity, simplicity and brevity.
   2. One should say what one means in code, as simply and directly as
   possible. Corollary: lots of indirection or overuse of macros to hide
   details is neither simple nor direct. That's an optimization for the
   writer, not the reader; see (1).
   3. One should not optimize code unless it can be shown that it's
   actually a performance bottleneck in a measurable way. Favor simplicity
   over performance until one has numbers that show that performance along
   some dimension is actually an issue.
   4. Prefer a single, simple solution that's portable over many solutions
   that are optimally performant. The compiler will probably catch up
   eventually anyway.
   5. Where one must optimize, measure twice before doing so and make sure
   one understands the results. Optimize only the slow part of the program.
   Mark optimized parts clearly; they're often an area for cleanup as
   compilers catch up or the environment otherwise changes.
   6. Look for algorithmic changes or different data structures to improve
   performance before twiddling bits.
   7. A simpler program is often easier to optimize at a high level (e.g.,
   using a more appropriate data structure instead of optimizing
   instructions). Corollary: premature optimization is often a pessimization.
   8. Cache locality, alignment, and other such things can have outsized
   effects. Beware just reading code: one really needs an appropriate
   benchmark that shows the code *executing* in context to understand
   performance.
   9. Avoid undefined behavior. It *will* come back to bite you:
   https://pdos.csail.mit.edu/papers/ub:apsys12.pdf and
   https://pdos.csail.mit.edu/papers/stack:sosp13.pdf
   10. Strive for elegant simplicity; things should be as simple as
   possible, but not simplistic.

        - Dan C.

-- 
You received this message because you are subscribed to the Google Groups 
"Akaros" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [akaros] perfmon read/write interface

Reply via email to