Re: [fpc-devel] The "magic div" algorithm

2021-11-16 Thread J. Gareth Moreton via fpc-devel
If the code breaks down to "x div 10" (with 10 as an explicit constant and x being a 64-bit integer), then yes it will.  If the divisor is a variable though, then it won't.  However, the "magic div" algorithm only works for numerators up to a given bit size; if the number

Re: [fpc-devel] The "magic div" algorithm

2021-11-16 Thread Benito van der Zander via fpc-devel
Hi, in my big decimal unit, I need to divide by 10 all the time, e.g. https://github.com/benibela/bigdecimalmath/blob/master/bigdecimalmath.pas#L1324-L1325 would the magic div help there much? Bye, Benito On 09.11.21 22:12, J. Gareth Moreton via fpc-devel wrote: This one for

Re: [fpc-devel] The "magic div" algorithm

2021-11-09 Thread J. Gareth Moreton via fpc-devel
This one for Marģers specifically, You'll be pleased to know that your insight has been partially implemented! https://gitlab.com/freepascal.org/fpc/source/-/merge_requests/51 This only expands 32-bit divisions to 64-bit, since the smaller sizes requires more work at the node level, but is

Re: [fpc-devel] The "magic div" algorithm

2021-09-15 Thread J. Gareth Moreton via fpc-devel
not detect word and byte size on division by constant? Squeezing in 32 bit register it would make byte-code shorter, not necessarily faster. - Reply to message - *Subject: *Re: [fpc-devel] The "magic div" algorithm *From: * J. Gareth Moreton via fpc-devel <mailto:fpc-dev

Re: [fpc-devel] The "magic div" algorithm

2021-09-15 Thread Marģers . via fpc-devel
Hi, Thank you for implementation. Is that true, that you can not detect word and byte size on division by constant? Squeezing in 32 bit register it would make byte-code shorter, not necessarily faster.   - Reply to message - Subject: Re: [fpc-devel] The "magic div" algorith

Re: [fpc-devel] The "magic div" algorithm

2021-09-10 Thread J. Gareth Moreton via fpc-devel
This one is for Marģers especially: https://gitlab.com/freepascal.org/fpc/source/-/issues/39355 Gareth aka. Kit -- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus ___ fpc-devel maillist -

Re: [fpc-devel] The "magic div" algorithm

2021-09-03 Thread J. Gareth Moreton via fpc-devel
Fixed the problem.  I was deallocating a register too soon, so it got overwritten in some cases while still in use.  Running tests again. Gareth aka. Kit -- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus

Re: [fpc-devel] The "magic div" algorithm

2021-09-03 Thread J. Gareth Moreton via fpc-devel
Div by 3 has a very elegant implementation.  In 32-bit, it's movl    $0xAAAB,%eax mull    Input ; answer is in %edx:%eax shrl    $1,%edx movl    %edx, Result As Marģers discovered, you can reduce the cycle count and minimise pipeline stalls by extending to 64-bit and applying the shift to

Re: [fpc-devel] The "magic div" algorithm

2021-09-03 Thread Ched via fpc-devel
Very interesting, Gareth ! Is the div-by-7 related to 2 to the 3rd ? If yes, is it possible to design a div-by-3 with similar magics ? Cheers, Ched' Le 03.09.21 à 15:35, J. Gareth Moreton via fpc-devel a écrit : Hey Marģers, So I've been experimenting with your suggestion, and it looks

Re: [fpc-devel] The "magic div" algorithm

2021-09-03 Thread J. Gareth Moreton via fpc-devel
Hey Marģers, So I've been experimenting with your suggestion, and it looks like a resounding success!  I added some new tests to the "bdiv" bench test to see how it performs.  16-bit multiplications don't get improved as well as they could be on x86_64 because the intermediate values are all

Re: [fpc-devel] The "magic div" algorithm

2021-08-30 Thread J. Gareth Moreton via fpc-devel
Well, you'll be pleased to know that I am now back from my camping and hiking trip in the Lake District and have actually purchased "Hacker's Delight", and I have to say it's extremely insightful. I would love to play around and implement some of the suggested optimisations. Some of them will

Re: [fpc-devel] The "magic div" algorithm

2021-08-24 Thread Marģers . via fpc-devel
I came up with even shorter variant of div example  function teDWordDivBy7_v4( divided : dword):dword; assembler; nostackframe; asm mov ecx,divided mov rax,2635249153693862181 mul rcx mov eax,edx end; current version for comparison function teDWordDivBy7_v0( divided :

Re: [fpc-devel] The "magic div" algorithm

2021-08-24 Thread Marģers . via fpc-devel
teByteDivBy7( divided : dword):dword; assembler; nostackframe; asm mov ecx,divided mov eax,293 mul ecx shr eax, 11 end;   - Reply to message - Subject: Re: [fpc-devel] The "magic div" algorithm From: Marģers . via fpc-devel To: FPC developers' list For unsigned

Re: [fpc-devel] The "magic div" algorithm

2021-08-24 Thread Marģers . via fpc-devel
. For unsigned byte, word and dword divisions by constant on 64 bit cpu can be converted as good cases. Here is possibility for improvements.   - Reply to message - Subject: Re: [fpc-devel] The "magic div" algorithm From: J. Gareth Moreton via fpc-devel To: Something tells m

Re: [fpc-devel] The "magic div" algorithm

2021-08-20 Thread J. Gareth Moreton via fpc-devel
Something tells me I should purchase that book - I sense it could reveal some interesting insights. Note that while I understand the concept of turning integer division into multiplication (indeed, I implemented the first version into x86 before it was improved with the

Re: [fpc-devel] The "magic div" algorithm

2021-08-20 Thread Marģers . via fpc-devel
   is there a reference to the algorithm that's used to calculate the reciprocal constants used in the integer division optimisations for x86 and AArch64? Hacker’s Delight Second Edition Henry S. Warren, Jr.   ___ fpc-devel maillist -

[fpc-devel] The "magic div" algorithm

2021-08-20 Thread J. Gareth Moreton via fpc-devel
Hi everyone, I know I was told what the algorithm was already, but I can't find it in my email history, so I'm forced to ask again... is there a reference to the algorithm that's used to calculate the reciprocal constants used in the integer division optimisations for x86 and AArch64? It's