Hey Marģers,

So I've been experimenting with your suggestion, and it looks like a resounding success!  I added some new tests to the "bdiv" bench test to see how it performs.  16-bit multiplications don't get improved as well as they could be on x86_64 because the intermediate values are all extended to 32-bit, but you can still see a massive improvement on all the unsigned divisions by 7.

I'm currently running the test suite for i386-win32 and x86_64-win64, then will implement the same system for AArch64. Thanks so much for spotting the improvement.

Gareth aka. Kit

P.S. I still need to work on the signed modulus at some point.

P.P.S. Timings:

Trunk:

Division compilation and timing test (using constants from System and Sysutils)
-------------------------------------------------------------------------------
              Unsigned 16-bit division by 1 - Pass - average iteration duration: 0.349 ns                Unsigned 16-bit modulus by 1 - Pass - average iteration duration: 0.349 ns               Unsigned 16-bit division by 2 - Pass - average iteration duration: 0.349 ns                Unsigned 16-bit modulus by 2 - Pass - average iteration duration: 0.349 ns               Unsigned 16-bit division by 3 - Pass - average iteration duration: 0.640 ns                Unsigned 16-bit modulus by 3 - Pass - average iteration duration: 0.757 ns               Unsigned 16-bit division by 7 - Pass - average iteration duration: 0.873 ns                Unsigned 16-bit modulus by 7 - Pass - average iteration duration: 0.990 ns              Unsigned 16-bit division by 10 - Pass - average iteration duration: 0.582 ns               Unsigned 16-bit modulus by 10 - Pass - average iteration duration: 0.815 ns             Unsigned 16-bit division by 100 - Pass - average iteration duration: 0.582 ns              Unsigned 16-bit modulus by 100 - Pass - average iteration duration: 0.815 ns           Unsigned 16-bit division by 1,000 - Pass - average iteration duration: 0.582 ns            Unsigned 16-bit modulus by 1,000 - Pass - average iteration duration: 0.757 ns               Unsigned 32-bit division by 1 - Pass - average iteration duration: 0.291 ns                Unsigned 32-bit modulus by 1 - Pass - average iteration duration: 0.349 ns               Unsigned 32-bit division by 2 - Pass - average iteration duration: 0.349 ns                Unsigned 32-bit modulus by 2 - Pass - average iteration duration: 0.291 ns               Unsigned 32-bit division by 3 - Pass - average iteration duration: 0.582 ns                Unsigned 32-bit modulus by 3 - Pass - average iteration duration: 0.698 ns               Unsigned 32-bit division by 7 - Pass - average iteration duration: 0.815 ns                Unsigned 32-bit modulus by 7 - Pass - average iteration duration: 0.990 ns              Unsigned 32-bit division by 10 - Pass - average iteration duration: 0.582 ns               Unsigned 32-bit modulus by 10 - Pass - average iteration duration: 0.757 ns             Unsigned 32-bit division by 100 - Pass - average iteration duration: 0.582 ns              Unsigned 32-bit modulus by 100 - Pass - average iteration duration: 0.698 ns           Unsigned 32-bit division by 1,000 - Pass - average iteration duration: 0.524 ns            Unsigned 32-bit modulus by 1,000 - Pass - average iteration duration: 0.757 ns          Unsigned 32-bit division by 60,000 - Pass - average iteration duration: 0.524 ns           Unsigned 32-bit modulus by 60,000 - Pass - average iteration duration: 0.698 ns         Unsigned 32-bit division by 146,097 - Pass - average iteration duration: 0.582 ns          Unsigned 32-bit modulus by 146,097 - Pass - average iteration duration: 0.698 ns       Unsigned 32-bit division by 3,600,000 - Pass - average iteration duration: 0.582 ns        Unsigned 32-bit modulus by 3,600,000 - Pass - average iteration duration: 0.698 ns               Unsigned 64-bit division by 1 - Pass - average iteration duration: 0.349 ns                Unsigned 64-bit modulus by 1 - Pass - average iteration duration: 0.349 ns               Unsigned 64-bit division by 2 - Pass - average iteration duration: 0.349 ns                Unsigned 64-bit modulus by 2 - Pass - average iteration duration: 0.349 ns               Unsigned 64-bit division by 3 - Pass - average iteration duration: 0.582 ns                Unsigned 64-bit modulus by 3 - Pass - average iteration duration: 0.698 ns               Unsigned 64-bit division by 7 - Pass - average iteration duration: 0.873 ns                Unsigned 64-bit modulus by 7 - Pass - average iteration duration: 0.990 ns              Unsigned 64-bit division by 10 - Pass - average iteration duration: 0.524 ns               Unsigned 64-bit modulus by 10 - Pass - average iteration duration: 0.698 ns             Unsigned 64-bit division by 100 - Pass - average iteration duration: 0.815 ns              Unsigned 64-bit modulus by 100 - Pass - average iteration duration: 0.990 ns   Unsigned 64-bit division by 1,000,000,000 - Pass - average iteration duration: 0.815 ns    Unsigned 64-bit modulus by 1,000,000,000 - Pass - average iteration duration: 0.990 ns                 Signed 32-bit division by 1 - Pass - average iteration duration: 0.349 ns                  Signed 32-bit modulus by 1 - Pass - average iteration duration: 0.291 ns               Signed 32-bit division by 100 - Pass - average iteration duration: 0.698 ns                Signed 32-bit modulus by 100 - Pass - average iteration duration: 6.403 ns                 Signed 64-bit division by 1 - Pass - average iteration duration: 0.349 ns                  Signed 64-bit modulus by 1 - Pass - average iteration duration: 0.349 ns                Signed 64-bit division by 10 - Pass - average iteration duration: 0.582 ns                 Signed 64-bit modulus by 10 - Pass - average iteration duration: 6.577 ns                Signed 64-bit division by 18 - Pass - average iteration duration: 0.582 ns                 Signed 64-bit modulus by 18 - Pass - average iteration duration: 6.461 ns                Signed 64-bit division by 24 - Pass - average iteration duration: 0.640 ns                 Signed 64-bit modulus by 24 - Pass - average iteration duration: 6.519 ns               Signed 64-bit division by 100 - Pass - average iteration duration: 0.757 ns                Signed 64-bit modulus by 100 - Pass - average iteration duration: 6.461 ns               Signed 64-bit division by 153 - Pass - average iteration duration: 0.640 ns                Signed 64-bit modulus by 153 - Pass - average iteration duration: 6.519 ns             Signed 64-bit division by 1,461 - Pass - average iteration duration: 0.698 ns              Signed 64-bit modulus by 1,461 - Pass - average iteration duration: 6.577 ns Signed 64-bit division by 10,000 (Currency) - Pass - average iteration duration: 0.640 ns  Signed 64-bit modulus by 10,000 (Currency) - Pass - average iteration duration: 6.694 ns        Signed 64-bit division by 86,400,000 - Pass - average iteration duration: 0.640 ns         Signed 64-bit modulus by 86,400,000 - Pass - average iteration duration: 6.636 ns

ok
- Sum of average durations: 96.275 ns
- Overall average duration: 1.375 ns

Extended magic-div:

Division compilation and timing test (using constants from System and Sysutils)
-------------------------------------------------------------------------------
              Unsigned 16-bit division by 1 - Pass - average iteration duration: 0.349 ns                Unsigned 16-bit modulus by 1 - Pass - average iteration duration: 0.349 ns               Unsigned 16-bit division by 2 - Pass - average iteration duration: 0.349 ns                Unsigned 16-bit modulus by 2 - Pass - average iteration duration: 0.291 ns               Unsigned 16-bit division by 3 - Pass - average iteration duration: 0.524 ns                Unsigned 16-bit modulus by 3 - Pass - average iteration duration: 0.640 ns               Unsigned 16-bit division by 7 - Pass - average iteration duration: 0.466 ns                Unsigned 16-bit modulus by 7 - Pass - average iteration duration: 0.640 ns              Unsigned 16-bit division by 10 - Pass - average iteration duration: 0.582 ns               Unsigned 16-bit modulus by 10 - Pass - average iteration duration: 0.698 ns             Unsigned 16-bit division by 100 - Pass - average iteration duration: 0.524 ns              Unsigned 16-bit modulus by 100 - Pass - average iteration duration: 0.582 ns           Unsigned 16-bit division by 1,000 - Pass - average iteration duration: 0.524 ns            Unsigned 16-bit modulus by 1,000 - Pass - average iteration duration: 0.640 ns               Unsigned 32-bit division by 1 - Pass - average iteration duration: 0.349 ns                Unsigned 32-bit modulus by 1 - Pass - average iteration duration: 0.349 ns               Unsigned 32-bit division by 2 - Pass - average iteration duration: 0.349 ns                Unsigned 32-bit modulus by 2 - Pass - average iteration duration: 0.291 ns               Unsigned 32-bit division by 3 - Pass - average iteration duration: 0.466 ns                Unsigned 32-bit modulus by 3 - Pass - average iteration duration: 0.582 ns               Unsigned 32-bit division by 7 - Pass - average iteration duration: 0.524 ns                Unsigned 32-bit modulus by 7 - Pass - average iteration duration: 0.582 ns              Unsigned 32-bit division by 10 - Pass - average iteration duration: 0.524 ns               Unsigned 32-bit modulus by 10 - Pass - average iteration duration: 0.640 ns             Unsigned 32-bit division by 100 - Pass - average iteration duration: 0.524 ns              Unsigned 32-bit modulus by 100 - Pass - average iteration duration: 0.582 ns           Unsigned 32-bit division by 1,000 - Pass - average iteration duration: 0.466 ns            Unsigned 32-bit modulus by 1,000 - Pass - average iteration duration: 0.582 ns          Unsigned 32-bit division by 60,000 - Pass - average iteration duration: 0.524 ns           Unsigned 32-bit modulus by 60,000 - Pass - average iteration duration: 0.582 ns         Unsigned 32-bit division by 146,097 - Pass - average iteration duration: 0.524 ns          Unsigned 32-bit modulus by 146,097 - Pass - average iteration duration: 0.524 ns       Unsigned 32-bit division by 3,600,000 - Pass - average iteration duration: 0.524 ns        Unsigned 32-bit modulus by 3,600,000 - Pass - average iteration duration: 0.524 ns               Unsigned 64-bit division by 1 - Pass - average iteration duration: 0.291 ns                Unsigned 64-bit modulus by 1 - Pass - average iteration duration: 0.349 ns               Unsigned 64-bit division by 2 - Pass - average iteration duration: 0.349 ns                Unsigned 64-bit modulus by 2 - Pass - average iteration duration: 0.407 ns               Unsigned 64-bit division by 3 - Pass - average iteration duration: 0.582 ns                Unsigned 64-bit modulus by 3 - Pass - average iteration duration: 0.698 ns               Unsigned 64-bit division by 7 - Pass - average iteration duration: 0.815 ns                Unsigned 64-bit modulus by 7 - Pass - average iteration duration: 1.048 ns              Unsigned 64-bit division by 10 - Pass - average iteration duration: 0.582 ns               Unsigned 64-bit modulus by 10 - Pass - average iteration duration: 0.640 ns             Unsigned 64-bit division by 100 - Pass - average iteration duration: 0.815 ns              Unsigned 64-bit modulus by 100 - Pass - average iteration duration: 0.990 ns   Unsigned 64-bit division by 1,000,000,000 - Pass - average iteration duration: 0.757 ns    Unsigned 64-bit modulus by 1,000,000,000 - Pass - average iteration duration: 1.048 ns                 Signed 32-bit division by 1 - Pass - average iteration duration: 0.349 ns                  Signed 32-bit modulus by 1 - Pass - average iteration duration: 0.349 ns               Signed 32-bit division by 100 - Pass - average iteration duration: 0.757 ns                Signed 32-bit modulus by 100 - Pass - average iteration duration: 6.461 ns                 Signed 64-bit division by 1 - Pass - average iteration duration: 0.349 ns                  Signed 64-bit modulus by 1 - Pass - average iteration duration: 0.349 ns                Signed 64-bit division by 10 - Pass - average iteration duration: 0.640 ns                 Signed 64-bit modulus by 10 - Pass - average iteration duration: 6.694 ns                Signed 64-bit division by 18 - Pass - average iteration duration: 0.698 ns                 Signed 64-bit modulus by 18 - Pass - average iteration duration: 6.636 ns                Signed 64-bit division by 24 - Pass - average iteration duration: 0.698 ns                 Signed 64-bit modulus by 24 - Pass - average iteration duration: 6.461 ns               Signed 64-bit division by 100 - Pass - average iteration duration: 0.757 ns                Signed 64-bit modulus by 100 - Pass - average iteration duration: 6.519 ns               Signed 64-bit division by 153 - Pass - average iteration duration: 0.640 ns                Signed 64-bit modulus by 153 - Pass - average iteration duration: 6.577 ns             Signed 64-bit division by 1,461 - Pass - average iteration duration: 0.698 ns              Signed 64-bit modulus by 1,461 - Pass - average iteration duration: 6.694 ns Signed 64-bit division by 10,000 (Currency) - Pass - average iteration duration: 0.698 ns  Signed 64-bit modulus by 10,000 (Currency) - Pass - average iteration duration: 6.694 ns        Signed 64-bit division by 86,400,000 - Pass - average iteration duration: 0.698 ns         Signed 64-bit modulus by 86,400,000 - Pass - average iteration duration: 6.577 ns

ok
- Sum of average durations: 93.540 ns
- Overall average duration: 1.336 ns


--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Reply via email to