Re: [capnproto] relaxing alignment requirements in capnproto-rust: am I missing anything?

Ian Denhardt Sat, 11 Jan 2020 08:42:03 -0800

I'm generally supportive of this, but also worth considering if the
change doesn't land: as an alternative to the current unsafe
bytes_to_words, you could provide a version that returns a Result, which
is Err unless the argument is not 8-byte aligned or the cpu architecture
is known to be able to handle unaligned access.


-Ian

Quoting David Renshaw (2020-01-11 11:11:54)
>    Thanks for the feedback!
>    I figured out how to get rustc to emit assembly for a variety of
>    targets. Results are in this blog
>    post:� [1]https://dwrensha.github.io/capnproto-rust/2020/01/11/unaligne
>    d-memory-access.html
>    I don't think there's any case in which the extra copy will actually be
>    an out-of-line memcpy function call.
>    - David
>
>    On Fri, Jan 10, 2020 at 10:25 AM Kenton Varda
>    <[2][email protected]> wrote:
>
>    First, make sure you add the -O2 compiler option in godbolt, so that
>    these are actually optimized. If you do that, `direct()` becomes two
>    instructions (on both architectures), while `indirect()` on ARM is
>    still 9 instructions.
>    It's true that on x86_64, this change will have no negative impact, as
>    you observed. But that's specifically because x86_64 supports unaligned
>    reads and writes, and so on this platform you don't actually need to
>    change anything to support unaligned buffers.
>    On ARM, your example is generating an out-of-line function call to
>    memcpy. I could be wrong, but I think this will be heavier than you are
>    imagining. There are three issues:
>    - The function call itself takes several instructions.
>    - An out-of-line function call will force the compiler to be more
>    conservative about optimizations around it. When a getter is inlined
>    into a larger function body, this could lead to a lot more overhead
>    than is visible in the godbolt example. For example, caller-saved
>    registers used by that outer function would need to be saved and
>    restored around each call.
>    - The glibc implementation of memcpy() itself needs to be designed to
>    handle any size of memcpy, and is optimized for larger, variable-sized
>    copies, since small fixed copies would normally be inlined. Several
>    branches will be needed even for a small copy.
>    Here's the
>    code:� [3]https://github.com/lattera/glibc/blob/master/string/memcpy.c
>    And macros it depends
>    on:� [4]https://github.com/lattera/glibc/blob/master/sysdeps/generic/me
>    mcopy.h
>    It's hard to say how much effect all this would really have, but it
>    would make me uncomfortable.
>    But it might not be too hard to convince the compiler to generate a
>    fixed sequence of byte copies, rather than a memcpy call. That could be
>    a lot better. I'm kind of surprised that GCC doesn't optimize it this
>    way automatically, TBH.
>    BTW it looks like arm64 gets optimized to an unaligned load just like
>    x86_64. So the future seems to be one where we don't need to worry
>    about alignment anymore. Maybe that's a good argument for going ahead
>    with this approach now.
>    -Kenton
>
>    On Thu, Jan 9, 2020 at 10:03 PM David Renshaw <[5][email protected]>
>    wrote:
>
>    I want to make it easy and safe for users of capnproto-rust to read
>    messages from unaligned buffers without copying.�  (See [6]this github
>    issue.)
>    Currently, a user must pass their unaligned buffer through� [7]unsafe
>    fn bytes_to_words(), asserting that they believe their hardware� to be
>    okay with unaligned reads. In other words, we require that the user
>    understand some tricky low-level processor details, and that the user
>    preclude their software from running on many platforms.
>    (With libraries like sqlite, zmq, redis, and many others, there simply
>    is no way to request that a buffer be aligned -- you are just given an
>    array of bytes. You can copy the bytes into an aligned buffer, but that
>    has a performance cost and a complexity cost (who owns the new
>    buffer?).)
>    I believe that it would be better for capnproto-rust to work natively
>    on unaligned buffers. In fact, I have a work-in-progress branch that
>    achieves this, essentially by changing a bunch of direct memory
>    accesses into tiny memcpy() calls. This [8]c++ godbolt snippet captures
>    the main idea, and shows that, on x86_64 at least, the extra
>    indirection gets optimized away completely. Indeed, my performance
>    measurements so far support the hypothesis that there will be no
>    performance cost in the x86_64 case. For processors that don't support
>    unaligned access, the extra copy will still be there (e.g.
>    [9]https://godbolt.org/z/qgsGMT), but I hypothesize that it will be
>    fast.
>    All in all, this change seems to me like a big usability win. So I'm
>    wondering: have I missed anything in the above analysis? Are there good
>    reasons I shouldn't make the change?
>    - David
>
>      --
>      You received this message because you are subscribed to the Google
>      Groups "Cap'n Proto" group.
>      To unsubscribe from this group and stop receiving emails from it,
>      send an email to [10][email protected].
>      To view this discussion on the web visit
>      [11]https://groups.google.com/d/msgid/capnproto/CABR6rW-JpiJntc0i7O4
>      cVywzfvd2YnVp89BgYeJp_Gwzoc_Edg%40mail.gmail.com.
>
>    --
>    You received this message because you are subscribed to the Google
>    Groups "Cap'n Proto" group.
>    To unsubscribe from this group and stop receiving emails from it, send
>    an email to [12][email protected].
>    To view this discussion on the web visit
>    [13]https://groups.google.com/d/msgid/capnproto/CABR6rW8Xw5eveWtJGpv3_F
>    Ex_wKesHc0EDHEtdw-q0Fow%3DK6eA%40mail.gmail.com.
>
> Verweise
>
>    1. 
> https://dwrensha.github.io/capnproto-rust/2020/01/11/unaligned-memory-access.html
>    2. mailto:[email protected]
>    3. https://github.com/lattera/glibc/blob/master/string/memcpy.c
>    4. https://github.com/lattera/glibc/blob/master/sysdeps/generic/memcopy.h
>    5. mailto:[email protected]
>    6. https://github.com/capnproto/capnproto-rust/issues/101
>    7. 
> https://github.com/capnproto/capnproto-rust/blob/d1988731887b2bbb0ccb35c68b9292d98f317a48/capnp/src/lib.rs#L82-L88
>    8. https://godbolt.org/z/Wki7uy
>    9. https://godbolt.org/z/qgsGMT
>   10. mailto:[email protected]
>   11. 
> https://groups.google.com/d/msgid/capnproto/CABR6rW-JpiJntc0i7O4cVywzfvd2YnVp89BgYeJp_Gwzoc_Edg%40mail.gmail.com?utm_medium=email&utm_source=footer
>   12. mailto:[email protected]
>   13. 
> https://groups.google.com/d/msgid/capnproto/CABR6rW8Xw5eveWtJGpv3_FEx_wKesHc0EDHEtdw-q0Fow%3DK6eA%40mail.gmail.com?utm_medium=email&utm_source=footer

-- 
You received this message because you are subscribed to the Google Groups 
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/capnproto/157876085900.74264.10639491434134744676%40localhost.localdomain.

Re: [capnproto] relaxing alignment requirements in capnproto-rust: am I missing anything?

Reply via email to