2. OK'ish: A bunch of testcases see more reads/writes as PRE of redundant
read/writes is punted to later passes which obviously needs more work.

3. NOK: We loose the ability to instrument local RM writes - especially in the
testsuite.
  e.g.
     a.  instrinsic setting a static RM
     b. get_frm() to ensure that happened (inline asm to read out frm)

The tightly coupled restore kicks in before get_frm could be emitted which fails
to observe #a. This is a deal breaker for the testsuite as much of frm tests
report as fail even if the actual codegen is sane.

I'd say that most of the tests we have right now are written with the existing behavior in mind and don't necessarily translate well to a changed behavior.

We mostly test the proper LCM and backup update behavior and backup updates don't happen with a local-only approach.

I haven't really understood how the FRM-changing intrinsics are used.

There are two extremes: - A single intrinsic using a different rounding mode and a lot of other arithmetic before and after it. In that case we cannot optimize anyway because the rest must operate with the global rounding mode.

- A longer code sequence, like a function, that uses a different rounding mode and every instrinsic being FRM-changing. In that case we would need to optimize a lot of saves and restores away until we only have a single save at the beginning and a single restore at the end.

I suppose we don't handle the latter case well right now. But on the other hand it's also not very interesting as explicit fegetround (), fesetround (), fesetround () is what the user would/should have done anyway.

So IMHO the only interesting cases are somewhere in the middle. It would really help to have some examples here that could tell us whether the simple approach leaves a lot on the table (in terms of redundant save/restore).


--
Regards
Robin

Reply via email to