https://issues.dlang.org/show_bug.cgi?id=13474
--- Comment #28 from yebblies <[email protected]> --- (In reply to Walter Bright from comment #26) > > I think that's well covered by adding an intrinsic, > > I produced a PR request for that in druntime. Nobody liked it, and it > languishes unpulled. > > https://github.com/dlang/druntime/pull/1621 I must have missed that. I'm happy to review/merge dmd changes related to that. I'm worried the other approach will just cause a performance issue that's impossible to work around. > 1. It is unknown what 32 bit x86 CPUs are used for embedded systems. I > dislike adding more codegen switches, because every switch doubles the time > it takes to run the test suite, and few developers set them correctly. (Who > ever sets that blizzard of switches gcc has correctly?) Who is using dmd on an embedded system? Why? Embedded system users are exactly the people who are setting gcc switches correctly. > 2. It's not a simple matter of turning it on, even though dmd generates XMM > code for OSX 32 bit. The trouble is in getting the stack aligned to 16 bytes. > The Linux way of doing that is different from OSX, so there's some > significant dev work to do to match it. Yeah, I know. Then again, wouldn't using unaligned loads/stores still be faster than using the x87? Last I checked, it was... not fast. > I believe that making faster 64 bit code should have priority over making > faster 32 bit > code, based on the idea that users who feel the need for speed are going to > be using -m64. It's much easier to switch over to m64 on linux, which is why I'm still using m32 on windows. One day... Can you put together a dmd PR to go with druntime 1621? I'm guessing it's pretty easy, since a new OPER will default to not being optimized? --
