> Am 16.04.2022 um 01:26 schrieb J. Gareth Moreton via fpc-devel > <fpc-devel@lists.freepascal.org>: > > Hi everyone, > > This is something that sprung to mind when thinking about code speed and the > like, and one thing that cropped up is the initialisation of large variables > such as arrays or records. A common means of doing this is, say: > > FillChar(MyVar, SizeOf(MyVar), 0); > > To keep things as general-purpose as possible, this usually results in a > function call that decides the best course of action, and for very large > blocks of data whose size may not be deterministic (e.g. a file buffer), this > is the best approach - the overhead is relatively small and it quickly uses > fast block-move instructions. > > However, for small-to-mid-sized variables of known size, this can lead to > some inefficiencies, first by not taking into account that the size of the > variable is known, but also because the initialisation value is zero, more > often that not, and the variable is probably aligned on the stack (so the > checks to make sure a pointer is aligned are unnecessary). > > I did a proof of concept on x86_64-win64 with the following record: > > type > TTestRecord = record > Field1: Byte; > Field2, Field3, Field4: Integer; > end; > > SizeOf(TTestRecord) is 16 and all the fields are on 4-byte boundaries. > Nothing particularly special. > > I then declared a variable of this time and filled the fields with random > values, and then ran two different methods to clear their memory. To get a > good speed average, I ran each method 1,000,000,000 times in a for-loop. The > first method was: > > FillChar(TestRecord, SizeOf(TestRecord), 0); > > The second method was inline assembly language (which I've called 'the > intrinsic'): > > asm > PXOR XMM0, XMM0 > MOVDQU [RIP+TestRecord], XMM0 > end;2 > > It's not perfect because the presence of inline assembly prevents the use of > register variables (although TestRecord is always on the stack regardless), > but the performance hit is barely noticeable in this case, and if the > assembly language were inserted by the compiler, the register variable > problem won't arise. > > These are my results: > > FillChar time: 2.398 ns > > Field1 = 0 > Field2 = 0 > Field3 = 0 > Field4 = 0 > > Intrinsic time: 1.336 ns > > Field1 = 0 > Field2 = 0 > Field3 = 0 > Field4 = 0 > > Sure, it's on the order of nanoseconds, but the intrinsic is almost twice as > fast. > > In terms of size - FillChar call = 20 bytes: > > 488d0d22080200 lea 0x20822(%rip),%rcx # 0x100022010 > 4531c0 xor %r8d,%r8d > ba10000000 mov $0x10,%edx > e8150a0000 callq 0x100002210 > <SYSTEM_$$_FILLCHAR$formal$INT64$BYTE> > > The intrinsic = 12 bytes: > > 660fefc0 pxor %xmm0,%xmm0 > f30f7f05bd050200 movdqu %xmm0,0x205bd(%rip) # 0x100022010 > > For a 32-byte record instead, an extra 8-byte MOVDQU instruction would be > required, so the 2 would be equal size, but with the bonus that the intrinsic > doesn't have a function call and will probably help optimisation in the rest > of the procedure by freeing up the registers used to pass parameters (%rcx, > %rdx and %r8 in this case; although the intrinsic will require an MM register > in this x86_64 example, they tend to not be used as often). Also, the > peephole optimizer can remove redundant PXOR XMM0, XMM0 calls, which will > help as well if there are multiple FillChar calls. > > I'm not proposing a total rewrite, and I would say that in the default case, > it should just fall back to the in-built System functions, but the relevant > compiler nodes could be overridden on specific platforms to generate smaller, > more optimised code when the sizes and values are known at compile time. > > Now, in this example, it is still faster to simply set the fields manually > one-by-one (clocks in at around 1.2 ns), possibly due to the unaligned write > (MOVDQU) and internal SSE state switching adding some overhead, but there's > nothing to stop the compiler from inserting code in place of the FillChar > call to do just that if it thinks it's the fastest method. Then again, one > has to be a little bit careful because FillChar and the intrinsic will also > set the filler bytes between Field1 and Field2 to 0, whereas manually > assigning 0 to the fields won't (so they aren't strictly equivalent and might > only be allowed if there are no filler bytes or when compiling under -O4, but > the latter may still be dangerous when typecasting is concerned), and extra > care would have to be taken when unions are concerned (sorry, 'union' that's > a C term - what's the official Pascal term again?). > > Actual Pascal calls to FillChar would not change in any way and so > theoretically it won't break existing code. The only drawback is that the > intrinsic and the internal System functions would have to be named the same > so constructs such as "FuncPtr := @FillChar;" as well as calling FillChar > from assembler routines stilll work, and the compiler would have to know how > to differentiate between the two. > > Just on the surface, what are your thoughts? Inlining FillChar is for sure useful (same for move). The FillChar in the system unit could stay, the compiler could just replace a call to System.FillChar by some compiler generated assembler doing the FillChar. _______________________________________________ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Thoughts: Make FillChar etc. an intrinsic for specialised performance potential
Florian Klämpfl via fpc-devel Sat, 16 Apr 2022 12:00:56 -0700
- [fpc-devel] Thoughts: Make FillChar et... J. Gareth Moreton via fpc-devel
- Re: [fpc-devel] Thoughts: Make Fi... Benito van der Zander via fpc-devel
- Re: [fpc-devel] Thoughts: Mak... Sven Barth via fpc-devel
- Re: [fpc-devel] Thoughts:... Jeppe Johansen via fpc-devel
- Re: [fpc-devel] Thoughts:... Benito van der Zander via fpc-devel
- Re: [fpc-devel] Thoughts: Make Fi... Sven Barth via fpc-devel
- Re: [fpc-devel] Thoughts: Make Fi... Florian Klämpfl via fpc-devel
- Re: [fpc-devel] Thoughts: Mak... J. Gareth Moreton via fpc-devel
- Re: [fpc-devel] Thoughts: Mak... Sven Barth via fpc-devel
- Re: [fpc-devel] Thoughts:... J. Gareth Moreton via fpc-devel
- Re: [fpc-devel] Thoughts: Make Fi... Stefan Glienke via fpc-devel
- Re: [fpc-devel] Thoughts: Mak... Sven Barth via fpc-devel
- Re: [fpc-devel] Thoughts:... Stefan Glienke via fpc-devel
- Re: [fpc-devel] Thoug... J. Gareth Moreton via fpc-devel
- Re: [fpc-devel] ... Stefan Glienke via fpc-devel
- Re: [fpc-dev... J. Gareth Moreton via fpc-devel