On 10/31/22 15:49, Joe Darcy wrote:
In terms of the overhead of using floating-point expression evaluation as a guard, are there still platforms where operating on subnormal values is pathologically slower? Some generations of SPARC chips had that behavior where a subnormal multiply would take, say 10,000 cycles, rather than 3 or 4 since the subnormal operations were implemented via trap handling.
That's a very interesting point. I know it used to be the case that denormals were handled by trapping to microcode, but there are good hardware algorithms since Schwarz et al, 2003 [1]. This paper showed how with a little hardware, such numbers can be handled close to the speed of normalized numbers. I deliberately ran my tests on a ten-year-old CPU, but I guess I'd have to go further back to find a bad case. Anyway, I plan to a. Restore the FPU CR after calls to dlopen(3). b. Detect FPU CR corruption at safepoints, and print a warning. At least the user might find out that something is wrong. I think this will avoid most cases of badness. I guess I'll need a CSR for this? [1] Hardware implementations of denormalized numbers, DOI:10.1109/ARITH.2003.1207662 Conference: Computer Arithmetic, 2003. Proceedings. 16th IEEE Symposium on -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. <https://www.redhat.com> https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671