Re: [capnproto] relaxing alignment requirements in capnproto-rust: am I missing anything?

'Kenton Varda' via Cap'n Proto Fri, 10 Jan 2020 07:25:51 -0800

First, make sure you add the -O2 compiler option in godbolt, so that these
are actually optimized. If you do that, `direct()` becomes two instructions
(on both architectures), while `indirect()` on ARM is still 9 instructions.


It's true that on x86_64, this change will have no negative impact, as you
observed. But that's specifically because x86_64 supports unaligned reads
and writes, and so on this platform you don't actually need to change
anything to support unaligned buffers.

On ARM, your example is generating an out-of-line function call to memcpy.
I could be wrong, but I think this will be heavier than you are imagining.
There are three issues:

- The function call itself takes several instructions.
- An out-of-line function call will force the compiler to be more
conservative about optimizations around it. When a getter is inlined into a
larger function body, this could lead to a lot more overhead than is
visible in the godbolt example. For example, caller-saved registers used by
that outer function would need to be saved and restored around each call.
- The glibc implementation of memcpy() itself needs to be designed to
handle any size of memcpy, and is optimized for larger, variable-sized
copies, since small fixed copies would normally be inlined. Several
branches will be needed even for a small copy.

Here's the code:
https://github.com/lattera/glibc/blob/master/string/memcpy.c
And macros it depends on:
https://github.com/lattera/glibc/blob/master/sysdeps/generic/memcopy.h

It's hard to say how much effect all this would really have, but it would
make me uncomfortable.

But it might not be too hard to convince the compiler to generate a fixed
sequence of byte copies, rather than a memcpy call. That could be a lot
better. I'm kind of surprised that GCC doesn't optimize it this way
automatically, TBH.

BTW it looks like arm64 gets optimized to an unaligned load just like
x86_64. So the future seems to be one where we don't need to worry about
alignment anymore. Maybe that's a good argument for going ahead with this
approach now.

-Kenton

On Thu, Jan 9, 2020 at 10:03 PM David Renshaw <[email protected]> wrote:

> I want to make it easy and safe for users of capnproto-rust to read
> messages from unaligned buffers without copying.  (See this github issue
> <https://github.com/capnproto/capnproto-rust/issues/101>.)
>
> Currently, a user must pass their unaligned buffer through unsafe fn
> bytes_to_words()
> <https://github.com/capnproto/capnproto-rust/blob/d1988731887b2bbb0ccb35c68b9292d98f317a48/capnp/src/lib.rs#L82-L88>,
> asserting that they believe their hardware to be okay with unaligned reads.
> In other words, we require that the user understand some tricky low-level
> processor details, and that the user preclude their software from running
> on many platforms.
>
> (With libraries like sqlite, zmq, redis, and many others, there simply is
> no way to request that a buffer be aligned -- you are just given an array
> of bytes. You can copy the bytes into an aligned buffer, but that has a
> performance cost and a complexity cost (who owns the new buffer?).)
>
> I believe that it would be better for capnproto-rust to work natively on
> unaligned buffers. In fact, I have a work-in-progress branch that achieves
> this, essentially by changing a bunch of direct memory accesses into tiny
> memcpy() calls. This c++ godbolt snippe <https://godbolt.org/z/Wki7uy>t
> captures the main idea, and shows that, on x86_64 at least, the extra
> indirection gets optimized away completely. Indeed, my performance
> measurements so far support the hypothesis that there will be no
> performance cost in the x86_64 case. For processors that don't support
> unaligned access, the extra copy will still be there (e.g.
> https://godbolt.org/z/qgsGMT), but I hypothesize that it will be fast.
>
> All in all, this change seems to me like a big usability win. So I'm
> wondering: have I missed anything in the above analysis? Are there good
> reasons I shouldn't make the change?
>
> - David
>
> --
> You received this message because you are subscribed to the Google Groups
> "Cap'n Proto" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/capnproto/CABR6rW-JpiJntc0i7O4cVywzfvd2YnVp89BgYeJp_Gwzoc_Edg%40mail.gmail.com
> <https://groups.google.com/d/msgid/capnproto/CABR6rW-JpiJntc0i7O4cVywzfvd2YnVp89BgYeJp_Gwzoc_Edg%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/capnproto/CAJouXQ%3DHAeRDqg3rWzyySKyW_NXo_HNmW8ucY_bVXn%2BjHi0fog%40mail.gmail.com.

Re: [capnproto] relaxing alignment requirements in capnproto-rust: am I missing anything?

Reply via email to