[Barry Scott and Steve Dower share tips for convincing Visual Studio
to show assembler without recompiling the file]
Thanks, fellows! That mostly ;-) workedl. Problem remaining is that
breakpoints just didn't work. They showed up "visually", and in the table
of set breakpoints, but code went
On 1/17/2022 8:47 PM, Barry Scott wrote:
On 17 Jan 2022, at 06:35, Tim Peters wrote:
[Guido]
I don't think there's a way to do a PGO build from Visual Studio; but
a command prompt in the repo can do it using `PCbuild\build.bat --pgo`.
Just be patient with it.
Thanks! That worked, and was
> On 17 Jan 2022, at 06:35, Tim Peters wrote:
>
> [Guido]
>> I don't think there's a way to do a PGO build from Visual Studio; but
>> a command prompt in the repo can do it using `PCbuild\build.bat --pgo`.
>> Just be patient with it.
>
> Thanks! That worked, and was easy, and gave me an
[Guido]
> I don't think there's a way to do a PGO build from Visual Studio; but
> a command prompt in the repo can do it using `PCbuild\build.bat --pgo`.
> Just be patient with it.
Thanks! That worked, and was easy, and gave me an executable that runs
"// 10" at supernatural speed.
Alas, Visual
On Sun, Jan 16, 2022 at 1:51 PM Mark Dickinson wrote:
> On Sun, Jan 16, 2022 at 9:28 PM Guido van Rossum wrote:
>
>> Does the optimization for //10 actually help in the real world? [...]
>>
>
> Yep, I don't know. If 10 is *not* the most common small divisor in real
> world code, it must at
On Sun, Jan 16, 2022 at 2:21 PM Tim Peters wrote:
> I have to believe the same is true under Visual Studio 2019, but
> offhand don't know how to prove that. I understand Steve uses PGO to
> build the python.org Windows release, but I've never done that - the
> "Release build" configuration I get
[Tim, incidentally notes that passing 10 as the divisor to inplace_divrem1()
is "impossibly fast" on Windows, consuming less than a third the time as
when passing seemingly any other divisor]
[Mark Dickinson, discovers much the same is true under other, but not all,
Linux-y builds, due to the
On Sun, Jan 16, 2022 at 9:28 PM Guido van Rossum wrote:
> Does the optimization for //10 actually help in the real world? [...]
>
Yep, I don't know. If 10 is *not* the most common small divisor in real
world code, it must at least rank in the top five. I might hazard a guess
that division by 2
As a bystander, this is all fascinating (I had actually anticipated that
the //10 optimization came from PGO).
Does the optimization for //10 actually help in the real world? It would if
people did a lot of manual conversion to decimal, which is easiest
expressed using //10. But presumably for
On Sun, Jan 16, 2022 at 12:08 PM Mark Dickinson wrote:
> So gcc is anticipating divisions by 10 and introducing special-case
> divide-by-reciprocal-multiply code for that case, and presumably the
> profile generated for the PGO backs up this being a common enough case, so
> we end up with the
On Sun, Jan 16, 2022 at 4:11 PM Terry Reedy wrote:
>
>
> https://stackoverflow.com/questions/41183935/why-does-gcc-use-multiplication-by-a-strange-number-in-implementing-integer-divi
>
> and
>
>
> https://stackoverflow.com/questions/30790184/perform-integer-division-using-multiplication
>
> have
On 1/16/2022 7:08 AM, Mark Dickinson wrote:
Now /that/ I certainly wasn't expecting. I don't see the same effect on
macOS / Clang, whether compiling with --enable-optimizations or not;
this appears to be a GCC innovation. And indeed, as Tim suggested, it
turns out that there's no division
On Sat, Jan 15, 2022 at 8:12 PM Tim Peters wrote:
> Something is missing here, but can't guess what without seeing the
> generated machine code.But I trust Mark will do that.
>
Welp, there goes my weekend. :-)
$ python -m timeit -n 150 -s "x = 10**1000" "x//10"
150 loops, best of 5:
[Gregory P. Smith ]
> ...
> That only appears true in default boring -O2 builds. Use
> `./configure --enable-optimizations` and the C version is *much* faster
> than your asm one...
>
> 250ns for C vs 370ns for your asm divl one using old gcc 9.3 on my
> zen3 when compiled using
On Fri, Jan 14, 2022 at 9:50 AM Mark Dickinson wrote:
> On Sun, Jan 2, 2022 at 10:35 AM Mark Dickinson wrote:
>
>> Division may still be problematic.
>>
>
> On that note: Python divisions are somewhat crippled even on x64. Assuming
> 30-bit digits, the basic building block that's needed for
]Mark Dickinson ]
>> Division may still be problematic.
Heh. I'm half convinced that heavy duty bigint packages are so often
written in assembler because their authors are driven insane by trying
to trick C compilers into generating "the obvious" machine
instructions needed.
An alternative to HW
On Sun, Jan 2, 2022 at 10:35 AM Mark Dickinson wrote:
> Division may still be problematic.
>
On that note: Python divisions are somewhat crippled even on x64. Assuming
30-bit digits, the basic building block that's needed for multi-precision
division is a 64-bit-by-32-bit unsigned integer
On Tue, Jan 4, 2022 at 3:24 AM Tal Einat wrote:
> I have a spare RPi zero that I could try to set up as a buildbot. Would
> that be useful?
>
No need. We've already got a 32-bit raspbian bot, adding another wouldn't
add value. The rpi0/1/2 are too slow to compile on anyways.
-gps
On Tue, Jan
Perhaps relevant for perspective:
We did some review of the pyperformance benchmarks based on how noisy they
are:
https://github.com/faster-cpython/ideas/discussions/142
Note that pidigits is the noisiest -- its performance changes up to 11% for
no good reason. The regex bms are also very noisy.
I have a spare RPi zero that I could try to set up as a buildbot. Would
that be useful?
On Tue, Jan 4, 2022 at 10:59 AM Antoine Pitrou wrote:
> On Mon, 3 Jan 2022 22:40:25 -0800
> "Gregory P. Smith" wrote:
> >
> > rerunning a mere few of those in --rigorous mode for more runs does not
> >
On Mon, 3 Jan 2022 22:40:25 -0800
"Gregory P. Smith" wrote:
>
> rerunning a mere few of those in --rigorous mode for more runs does not
> significantly improve the stddev so I'm not going to let that finish.
The one benchmark that is bigint-heavy is pidigits AFAIK, so you might
want to re-run
On Sun, Jan 2, 2022 at 2:37 AM Mark Dickinson wrote:
> On Sat, Jan 1, 2022 at 9:05 PM Antoine Pitrou wrote:
>
>> Note that ARM is merely an architecture with very diverse
>> implementations having quite differing performance characteristics. [...]
>>
>
> Understood. I'd be happy to see timings
On Sat, Jan 1, 2022 at 9:05 PM Antoine Pitrou wrote:
> Note that ARM is merely an architecture with very diverse
> implementations having quite differing performance characteristics. [...]
>
Understood. I'd be happy to see timings on a Raspberry Pi 3, say. I'm not
too worried about things like
On Fri, 31 Dec 2021 11:18:34 +
Mark Dickinson wrote:
>
> It looks as though the next step would be to run some integer-intensive
> benchmarks on 32-bit ARM, with both --enable-big-digits=15 and
> --enable-big-digits=30. If those show a win (or at least, not a significant
> loss) for 30-bit
Regarding ABI issues, I don't see anything obvious either. I was probably
misremembering the potential marshal issue, which was addressed.
struct _longobject (the implementation details behind the public
PyLongObject typedef name) and the digit definition are excluded from
Py_LIMITED_API. So per
On Fri, Dec 31, 2021 at 12:40 PM Skip Montanaro
wrote:
> Perhaps I missed it, but maybe an action item would be to add a
> buildbot which configures for 15-bit PyLong digits.
>
Yep, good point. I was wrong to say that "15-bit builds don't appear to be
exercised by the buildbots": there's a
Perhaps I missed it, but maybe an action item would be to add a
buildbot which configures for 15-bit PyLong digits.
Skip
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
Thanks all! So to summarize:
- 15-bit digits are still very much in use, and deprecating the option
would likely be premature at this point
- the main users are old 32-bit (x86), which it's difficult to care about
too much, and new 32-bit (principally ARM microarchitectures), which we
*do* care
>> The reason for digits being a multiple of 5 bits should be revisited vs
>> its original intent
> I added that. The only intent was to make it easier to implement
> bigint exponentiation easily ...
That said, I see the comments in longintrepr.h note a stronger constraint:
"""
the marshal code
[Gregory P. Smith ]
> The reason for digits being a multiple of 5 bits should be revisited vs
> its original intent
I added that. The only intent was to make it easier to implement
bigint exponentiation easily while viewing the exponent as being in
base 32 (so as to chew up 5 bits at a time)..
On Thu, Dec 30, 2021 at 12:42 PM Gregory P. Smith wrote:
>
> On Thu, Dec 30, 2021 at 4:47 AM Mark Dickinson wrote:
>
>> tl;dr: I'd like to deprecate and eventually remove the option to use
>> 15-bit digits in the PyLong implementation. Before doing so, I'd like to
>> find out whether there's
On Thu, Dec 30, 2021 at 4:47 AM Mark Dickinson wrote:
> tl;dr: I'd like to deprecate and eventually remove the option to use
> 15-bit digits in the PyLong implementation. Before doing so, I'd like to
> find out whether there's anyone still using 15-bit PyLong digits, and if
> so, why they're
32 matches
Mail list logo