Re: [gem5-dev] Review Request 2338: arm: Fix v8 neon latency issue for loads/stores

Mitch Hayenga via gem5-dev Thu, 21 Aug 2014 06:25:39 -0700


> On Aug. 16, 2014, 4:30 p.m., Nilay Vaish wrote:
> > Two questions:
> > * What are interleave/deinterleave microops?
> > * Why should they be marked No_Opclass?

Interleave/deinterleave operations relate to re-organizing the way data is 
structured when loaded to/from memory.

Say we had the following data sequentially in memory.

A[0].x
A[0].y
A[0].z
A[1].x
A[1].y
A[1].z
A[2].x
A[2].y
A[2].z
A[3].x
A[3].y
A[3].z

Lets say we want to load all of the 'z' coordinates into a SIMD register.  Such 
that D0 = {A[0].z, A[1].z, A[2].z, A[3].z}.  This process is called 
de-interleaving.  Currently we crack into micro-ops to perform each of the 
loads and then perform an expensive "de-interleave" micro-op.  

Since we were charging de-interleave micro-ops with 4 or 5 cycles of latency 
(can't remember which), they were skewing performance with respect to real 
hardware.  Real hardware seemingly has different micro-op decomposition and 
lower costs for such operations.

So, this patch is just a way to make minimal changes to bring performance back 
in line.  The goal was to make them have almost no performance impact.  Since 
O3 treats No_OpClass as "never resource constrained, single cycle latency" this 
was an easy way to get the desired behavior.  Additionally the new "minor" cpu 
also immediately executes No_OpClass instructions, so it was safe to do on the 
timing-focused cores.

- Mitch

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2338/#review5264
-----------------------------------------------------------

On Aug. 13, 2014, 2:07 p.m., Andreas Hansson wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/2338/
> -----------------------------------------------------------
> 
> (Updated Aug. 13, 2014, 2:07 p.m.)
> 
> 
> Review request for Default.
> 
> 
> Repository: gem5
> 
> 
> Description
> -------
> 
> Changeset 10305:2b6478741bf6
> ---------------------------
> arm: Fix v8 neon latency issue for loads/stores
> 
> Neon memory ops that operate on multiple registers currently have very poor
> performance because of interleave/deinterleave micro-ops.
> 
> This patch marks the deinterleave/interleave micro-ops as "No_OpClass" such
> that they take minumum cycles to execute and are never resource constrained.
> 
> Additionaly the micro-ops over-read registers.  Although one form may need
> to read up to 20 sources, not all do.  This adds in new forms so false
> dependencies are not modeled.  Instructions read their minimum number of
> sources.
> 
> 
> Diffs
> -----
> 
>   src/arch/arm/insts/macromem.cc 79fde1c67ed8 
>   src/arch/arm/isa/insts/neon64_mem.isa 79fde1c67ed8 
> 
> Diff: http://reviews.gem5.org/r/2338/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Andreas Hansson
> 
>

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Re: [gem5-dev] Review Request 2338: arm: Fix v8 neon latency issue for loads/stores

Reply via email to