I'm generally supportive of this, but also worth considering if the change doesn't land: as an alternative to the current unsafe bytes_to_words, you could provide a version that returns a Result, which is Err unless the argument is not 8-byte aligned or the cpu architecture is known to be able to handle unaligned access.
-Ian Quoting David Renshaw (2020-01-11 11:11:54) > Thanks for the feedback! > I figured out how to get rustc to emit assembly for a variety of > targets. Results are in this blog > post:� [1]https://dwrensha.github.io/capnproto-rust/2020/01/11/unaligne > d-memory-access.html > I don't think there's any case in which the extra copy will actually be > an out-of-line memcpy function call. > - David > > On Fri, Jan 10, 2020 at 10:25 AM Kenton Varda > <[2][email protected]> wrote: > > First, make sure you add the -O2 compiler option in godbolt, so that > these are actually optimized. If you do that, `direct()` becomes two > instructions (on both architectures), while `indirect()` on ARM is > still 9 instructions. > It's true that on x86_64, this change will have no negative impact, as > you observed. But that's specifically because x86_64 supports unaligned > reads and writes, and so on this platform you don't actually need to > change anything to support unaligned buffers. > On ARM, your example is generating an out-of-line function call to > memcpy. I could be wrong, but I think this will be heavier than you are > imagining. There are three issues: > - The function call itself takes several instructions. > - An out-of-line function call will force the compiler to be more > conservative about optimizations around it. When a getter is inlined > into a larger function body, this could lead to a lot more overhead > than is visible in the godbolt example. For example, caller-saved > registers used by that outer function would need to be saved and > restored around each call. > - The glibc implementation of memcpy() itself needs to be designed to > handle any size of memcpy, and is optimized for larger, variable-sized > copies, since small fixed copies would normally be inlined. Several > branches will be needed even for a small copy. > Here's the > code:� [3]https://github.com/lattera/glibc/blob/master/string/memcpy.c > And macros it depends > on:� [4]https://github.com/lattera/glibc/blob/master/sysdeps/generic/me > mcopy.h > It's hard to say how much effect all this would really have, but it > would make me uncomfortable. > But it might not be too hard to convince the compiler to generate a > fixed sequence of byte copies, rather than a memcpy call. That could be > a lot better. I'm kind of surprised that GCC doesn't optimize it this > way automatically, TBH. > BTW it looks like arm64 gets optimized to an unaligned load just like > x86_64. So the future seems to be one where we don't need to worry > about alignment anymore. Maybe that's a good argument for going ahead > with this approach now. > -Kenton > > On Thu, Jan 9, 2020 at 10:03 PM David Renshaw <[5][email protected]> > wrote: > > I want to make it easy and safe for users of capnproto-rust to read > messages from unaligned buffers without copying.� (See [6]this github > issue.) > Currently, a user must pass their unaligned buffer through� [7]unsafe > fn bytes_to_words(), asserting that they believe their hardware� to be > okay with unaligned reads. In other words, we require that the user > understand some tricky low-level processor details, and that the user > preclude their software from running on many platforms. > (With libraries like sqlite, zmq, redis, and many others, there simply > is no way to request that a buffer be aligned -- you are just given an > array of bytes. You can copy the bytes into an aligned buffer, but that > has a performance cost and a complexity cost (who owns the new > buffer?).) > I believe that it would be better for capnproto-rust to work natively > on unaligned buffers. In fact, I have a work-in-progress branch that > achieves this, essentially by changing a bunch of direct memory > accesses into tiny memcpy() calls. This [8]c++ godbolt snippet captures > the main idea, and shows that, on x86_64 at least, the extra > indirection gets optimized away completely. Indeed, my performance > measurements so far support the hypothesis that there will be no > performance cost in the x86_64 case. For processors that don't support > unaligned access, the extra copy will still be there (e.g. > [9]https://godbolt.org/z/qgsGMT), but I hypothesize that it will be > fast. > All in all, this change seems to me like a big usability win. So I'm > wondering: have I missed anything in the above analysis? Are there good > reasons I shouldn't make the change? > - David > > -- > You received this message because you are subscribed to the Google > Groups "Cap'n Proto" group. > To unsubscribe from this group and stop receiving emails from it, > send an email to [10][email protected]. > To view this discussion on the web visit > [11]https://groups.google.com/d/msgid/capnproto/CABR6rW-JpiJntc0i7O4 > cVywzfvd2YnVp89BgYeJp_Gwzoc_Edg%40mail.gmail.com. > > -- > You received this message because you are subscribed to the Google > Groups "Cap'n Proto" group. > To unsubscribe from this group and stop receiving emails from it, send > an email to [12][email protected]. > To view this discussion on the web visit > [13]https://groups.google.com/d/msgid/capnproto/CABR6rW8Xw5eveWtJGpv3_F > Ex_wKesHc0EDHEtdw-q0Fow%3DK6eA%40mail.gmail.com. > > Verweise > > 1. > https://dwrensha.github.io/capnproto-rust/2020/01/11/unaligned-memory-access.html > 2. mailto:[email protected] > 3. https://github.com/lattera/glibc/blob/master/string/memcpy.c > 4. https://github.com/lattera/glibc/blob/master/sysdeps/generic/memcopy.h > 5. mailto:[email protected] > 6. https://github.com/capnproto/capnproto-rust/issues/101 > 7. > https://github.com/capnproto/capnproto-rust/blob/d1988731887b2bbb0ccb35c68b9292d98f317a48/capnp/src/lib.rs#L82-L88 > 8. https://godbolt.org/z/Wki7uy > 9. https://godbolt.org/z/qgsGMT > 10. mailto:[email protected] > 11. > https://groups.google.com/d/msgid/capnproto/CABR6rW-JpiJntc0i7O4cVywzfvd2YnVp89BgYeJp_Gwzoc_Edg%40mail.gmail.com?utm_medium=email&utm_source=footer > 12. mailto:[email protected] > 13. > https://groups.google.com/d/msgid/capnproto/CABR6rW8Xw5eveWtJGpv3_FEx_wKesHc0EDHEtdw-q0Fow%3DK6eA%40mail.gmail.com?utm_medium=email&utm_source=footer -- You received this message because you are subscribed to the Google Groups "Cap'n Proto" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/capnproto/157876085900.74264.10639491434134744676%40localhost.localdomain.
