On 09/12/15 15:34, Claudiu Zissulescu wrote:

Well, it seems to me that we prefer to disable optimizations when talking about 
debug related information (see PR target/60598 git-svn-id: 
svn+ssh://gcc.gnu.org/svn/gcc/trunk@208749 138bc75d-0d04-0410-961f-82ee72b054a4 
commit).
Actually, unwind information might also be needed for exception handling, depending on the target.
(sjlj exception handling or dwarf2)
In which case it becomes a matter of program correctness if any exception might be raised that
could lead to incorrect unwinding due to incorrect/incomplete unwind info.

So were both wrong: this is not merely about debugging.

OTOH, the example you give also shows a much more nuanced approach to throttling optimization: the patch doesn't dead all epilogue scheduling, but specifically tests for the presence of a frame related
insn at the point where it could cause trouble.
The equivalent would be to reject a frame related insn in the delay slot of a conditional branch. For arc.md, that would likely involve splitting four of the define_delay statements into two each. If that seems a bit much, a compromize would be to patch in_delay_slot, thus affecting unconditional branches too. More ambitious approaches would be to move the note to a (zero-sized, pseudo-)placeholder insn, or finding a way to make the dwarf2cfi cope with speculated frame related insns. Eg.g even if the delay slot
is not anulled, could we mark the note as anulled?

Another approach would be to fix fill_eager_delay_slots not to stuff unconditional frame related insns into
non-anulled delay slots of conditional branches.
Or have some target hook to make it not even bother filling delay slots speculatively; for targets that can fully unexpose the delay slot, like SH and ARC >= ARC700, this aspect of fill_eager_delay_slots only mucks up
schedules and increases code size.

Unfortunately, dwarf2cfi checks the paths for consistency (dwarf2cfi.c:2284), 
throwing an error if those paths are not ok. Also, with ARC gcc we focus on 
Linux type of application, hence, the unwinding info needs to be consistent.
As far as I can see, having a sort of mno-epilogue-cfi option will just inhibit 
or not the emitting of the blockage instruction, and the option will be valid 
only when compiling without emitting dwarf information (i.e., the 
mno-epilogue-cfi is incompatible with -g).
Personally, I do not see the benefit of having such an option, as one may lose 
like 1 cycle per function call (HS/EM cpus) in very particular cases. Running 
the dg.exp, compile.exp, execute.exp, and builing Linux with some default apps, 
we found only 4 cases in which the branch scheduler slot needs the blockage 
mechanism.

The number of problems you found needs not bear any relation to the number of optimizations suppressed by the blockage instruction. Consider the case when there are high-latency unconditional instructions just before a lengthy epilogue. sched2 scheduling some epilogue instructions into the scheduling bubbles can hide the latencies. More relevant ways to get data would be comparing the object files (from a whole toolchain library set and/or one or more big application(s)) built with/without the blockage insn emitted, or to benchmark it.
  Also, adding blockage before generating any prologue instruction seems to be 
a wide spread practice in gcc backends (see SH for example).
It is a quick way to 'fix' (actually, hide) bugs. At the expense of optimization. It can be justified if the resources to keep a port in working order are limited, and so are the forgone
optimization benefits.
But at a minimum, you should add a comment to explain what problem you are papering over. That works best if you actually file a bug report in bugzilla first (about the interaction of fill_eager_delay_slots and dwarf2cfi) so that you can name the bug.

Reply via email to