> On Aug. 16, 2014, 4:30 p.m., Nilay Vaish wrote:
> > Two questions:
> > * What are interleave/deinterleave microops?
> > * Why should they be marked No_Opclass?
Interleave/deinterleave operations relate to re-organizing the way data is
structured when loaded to/from memory.
Say we had the following data sequentially in memory.
A[0].x
A[0].y
A[0].z
A[1].x
A[1].y
A[1].z
A[2].x
A[2].y
A[2].z
A[3].x
A[3].y
A[3].z
Lets say we want to load all of the 'z' coordinates into a SIMD register. Such
that D0 = {A[0].z, A[1].z, A[2].z, A[3].z}. This process is called
de-interleaving. Currently we crack into micro-ops to perform each of the
loads and then perform an expensive "de-interleave" micro-op.
Since we were charging de-interleave micro-ops with 4 or 5 cycles of latency
(can't remember which), they were skewing performance with respect to real
hardware. Real hardware seemingly has different micro-op decomposition and
lower costs for such operations.
So, this patch is just a way to make minimal changes to bring performance back
in line. The goal was to make them have almost no performance impact. Since
O3 treats No_OpClass as "never resource constrained, single cycle latency" this
was an easy way to get the desired behavior. Additionally the new "minor" cpu
also immediately executes No_OpClass instructions, so it was safe to do on the
timing-focused cores.
- Mitch
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2338/#review5264
-----------------------------------------------------------
On Aug. 13, 2014, 2:07 p.m., Andreas Hansson wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/2338/
> -----------------------------------------------------------
>
> (Updated Aug. 13, 2014, 2:07 p.m.)
>
>
> Review request for Default.
>
>
> Repository: gem5
>
>
> Description
> -------
>
> Changeset 10305:2b6478741bf6
> ---------------------------
> arm: Fix v8 neon latency issue for loads/stores
>
> Neon memory ops that operate on multiple registers currently have very poor
> performance because of interleave/deinterleave micro-ops.
>
> This patch marks the deinterleave/interleave micro-ops as "No_OpClass" such
> that they take minumum cycles to execute and are never resource constrained.
>
> Additionaly the micro-ops over-read registers. Although one form may need
> to read up to 20 sources, not all do. This adds in new forms so false
> dependencies are not modeled. Instructions read their minimum number of
> sources.
>
>
> Diffs
> -----
>
> src/arch/arm/insts/macromem.cc 79fde1c67ed8
> src/arch/arm/isa/insts/neon64_mem.isa 79fde1c67ed8
>
> Diff: http://reviews.gem5.org/r/2338/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Andreas Hansson
>
>
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev