Re: [PATCH RFC]Pair load store instructions using a generic scheduling fusion pass

Jeff Law Fri, 10 Oct 2014 14:14:30 -0700

On 09/30/14 03:22, Bin Cheng wrote:

Hi,
many load/store pairs as my old patch.  Then I decided to take one step
forward to introduce a generic instruction fusion infrastructure in GCC,
because in essence, load/store pair is nothing different with other
instruction fusion, all these optimizations want is to push instructions
together in instruction flow.

Great generalization. And yes, you're absolutely right, what you'redoing is building a fairly generic mechanism to mark insns that mightfuse together.


So, some questions.  Let's assume I've got 3 kinds of insns.  A B & C.

I can fuse AB or AC, but not BC. In fact, moving B & C together maysignificantly harm performance.

So my question is can a given insn have different fusion prioritiesdepending on its scheduling context?

So perhaps an example. Let's say I have an insn stream with thefollowing kinds of instructions, all ready at the same time.


AAAAAAAABBBBCCCC

Can I create 8 distinct fusion priorities such that I ultimately schedule
AB(1) AB(2) AB(3) AB(4) AC(5) AC(6) AC(7) AC(8)

I guess another way to ask the question, are fusion priorities staticbased on the insn/alternative, or can they vary? And if they can vary,can they vary each tick of the scheduler?

Now the next issue is I really don't want all those to fireback-to-back-to-back. I'd like some other insns to be inserted betweeneach fusion pair if they're in the ready list. I guess the easiest wayto get that is to assign the same fusion priority to other insns in theready queue, even though they don't participate in fusion. So


ABX(1) ABY(2).....

Where X & Y are some other arbitrary insns that don't participate in theAB fusion, but will issue in the same cycle as the AB fused insn.

Though I guess if we run fusion + peep2 between sched1 and sched2, thatproblem would just resolve itself as we'd have fused AB together into anew insn and we'd schedule normally with the fused insns and X, Y.

So here comes this patch.  It adds a new sched_fusion pass just before
peephole2.  The methodology is like:
1) The priority in scheduler is extended into [fusion_priority, priority]
pair, with fusion_priority as the major key and priority as the minor key.
2) The back-end assigns priorities pair to each instruction, instructions
want to be fused together get same fusion_priority assigned.

I think the bulk of my questions above are targetted at this part of thechange. When are these assignments made and how much freedom does thebackend have to make/change those assignments.

So another question, given a fused pair, is there any way to guaranteeordering within the fused pair. This is useful to cut down on thenumber of peep2 patterns. I guess we could twiddle the priority inthose cases to force a particular ordering of the fused pair, right?

I wonder if we could use this to zap all the hair I added to caller-saveback in the early 90s to try and widen the save/restore modes. Soinstead of st; st; call; ld; ld, we'd generate std; call; ldd. It was ahuge win for floating point on the sparc processors of that time. Idon't expect you to do that investigation. Just thinking out loud.


I collected performance data for both cortex-a15 and cortex-a57 (with a
local peephole ldp/stp patch), the benchmarks can be obviously improved on
arm/aarch64.  I also collected instrument data about how many load/store
pairs are found.  For the four versions of load/store pair patches:
0) The original Mike's patch.
1) My original prototype patch.
2) Cleaned up pass of Mike (with implementation bugs resolved).
3) This new prototype fusion pass.

The numbers of paired opportunities satisfy below relations:
3 * N0 ~ N1 ~ N2 < N3
For example, for one benchmark suite, we have:
N0 ~= 1300
N1/N2 ~= 5000
N3 ~= 7500

Nice.  Very nice.

Overall it's a fairly simple change.  I'll look deeper into it next week.

jeff

Re: [PATCH RFC]Pair load store instructions using a generic scheduling fusion pass

Reply via email to