[Bug rtl-optimization/81434] AArch64 instruction fusing and pipeline scheduling problem

wilco at gcc dot gnu.org Thu, 20 Jul 2017 10:21:27 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81434


--- Comment #8 from Wilco <wilco at gcc dot gnu.org> ---
(In reply to jim.wilson from comment #7)
> On Thu, Jul 20, 2017 at 4:20 AM, wilco at gcc dot gnu.org
> <gcc-bugzi...@gcc.gnu.org> wrote:
> > Do you think it might be feasible to update resource usage of a schedule 
> > group?
> > Or would it be easier to replace a fused pair with a single instruction with
> > correct resource usage, and expand after scheduling in say split5?
> >
> > For some cases (single destination reg like ADRP/ADD, AES, MOV/MOVK) it 
> > would
> > be simpler to treat them as a single instruction from early on.
> 
> I haven't looked at this issue yet.  This problem wastes one issue
> slot per cycle, where as the SCHED_GROUP problem wastes up to N-1
> issue slots per cycle, where N is the issue rate.  For falkor, this is
> up to 3 issue slots per cycle.  Since the SCHED_GROUP problem is more
> serious, I looked at that one first.
> 
> Thinking about this a bit, I don't know if there is an easy way to
> correct resource usage for a fused pair if represented as two insns.
> If we represent a fused pair as a single instruction, then we are
> getting the issue count wrong, as they take two issue slots, but one
> function unit slot.  However, there is a way to deal with the issue
> count.  We could use TARGET_SCHED_VARIABLE_ISSUE to make the single
> fused insn take two issue slots.  I already wrote a patch like that
> for a different reason as an experiment so I know this can work.  We
> may also have to worry about instruction lengths, depending on when
> exactly we split the fused insn into two insns.  There could be some
> other details that turn up when we try to implement this.  What do we
> do if a fused insn takes the last issue slot for instance?  Maybe we
> try to prevent that, or maybe we reduce the issue rate by one for the
> next cycle.  This may depend on how the hardware implements insn
> fusing.

Yes the details depend on how the hardware implements fusion, but based on the
tuning I did on Cortex-A53 model I'd say that you get good schedules with a
reasonable approximation. So if it is the last issue slot then it may be best
to force it to the next cycle like you say - that's no worse than if the fusion
didn't happen.

[Bug rtl-optimization/81434] AArch64 instruction fusing and pipeline scheduling problem

Reply via email to