Hi Mitch,

Not directly related to this patch, but I'd like to bring it up as it is
related to neon instructions.
I ran a program with a large number of FP DIV instructions and I observed
two incorrect things:
1. When the program is compiled for vfpv3, gem5 categorizes those FP DIV
instructions as SimdFloatDiv. In essence, gem5 categorize many scalar FP
instructions as SIMD instructions.
2. When the program is compiled for neon with automatic vectorization, gem5
categorizes those FP DIV instructions as SimdFloatMultAcc which is
obviously wrong.

Thanks,
Amin


On Thu, Aug 21, 2014 at 8:18 AM, Mitch Hayenga via gem5-dev <
[email protected]> wrote:

>
>
> > On Aug. 16, 2014, 4:30 p.m., Nilay Vaish wrote:
> > > Two questions:
> > > * What are interleave/deinterleave microops?
> > > * Why should they be marked No_Opclass?
>
> Interleave/deinterleave operations relate to re-organizing the way data is
> structured when loaded to/from memory.
>
> Say we had the following data sequentially in memory.
>
> A[0].x
> A[0].y
> A[0].z
> A[1].x
> A[1].y
> A[1].z
> A[2].x
> A[2].y
> A[2].z
> A[3].x
> A[3].y
> A[3].z
>
>
> Lets say we want to load all of the 'z' coordinates into a SIMD register.
> Such that D0 = {A[0].z, A[1].z, A[2].z, A[3].z}.  This process is called
> de-interleaving.  Currently we crack into micro-ops to perform each of the
> loads and then perform an expensive "de-interleave" micro-op.
>
> Since we were charging de-interleave micro-ops with 4 or 5 cycles of
> latency (can't remember which), they were skewing performance with respect
> to real hardware.  Real hardware seemingly has different micro-op
> decomposition and lower costs for such operations.
>
> So, this patch is just a way to make minimal changes to bring performance
> back in line.  The goal was to make them have almost no performance
> impact.  Since O3 treats No_OpClass as "never resource constrained, single
> cycle latency" this was an easy way to get the desired behavior.
> Additionally the new "minor" cpu also immediately executes No_OpClass
> instructions, so it was safe to do on the timing-focused cores.
>
>
> - Mitch
>
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/2338/#review5264
> -----------------------------------------------------------
>
>
> On Aug. 13, 2014, 2:07 p.m., Andreas Hansson wrote:
> >
> > -----------------------------------------------------------
> > This is an automatically generated e-mail. To reply, visit:
> > http://reviews.gem5.org/r/2338/
> > -----------------------------------------------------------
> >
> > (Updated Aug. 13, 2014, 2:07 p.m.)
> >
> >
> > Review request for Default.
> >
> >
> > Repository: gem5
> >
> >
> > Description
> > -------
> >
> > Changeset 10305:2b6478741bf6
> > ---------------------------
> > arm: Fix v8 neon latency issue for loads/stores
> >
> > Neon memory ops that operate on multiple registers currently have very
> poor
> > performance because of interleave/deinterleave micro-ops.
> >
> > This patch marks the deinterleave/interleave micro-ops as "No_OpClass"
> such
> > that they take minumum cycles to execute and are never resource
> constrained.
> >
> > Additionaly the micro-ops over-read registers.  Although one form may
> need
> > to read up to 20 sources, not all do.  This adds in new forms so false
> > dependencies are not modeled.  Instructions read their minimum number of
> > sources.
> >
> >
> > Diffs
> > -----
> >
> >   src/arch/arm/insts/macromem.cc 79fde1c67ed8
> >   src/arch/arm/isa/insts/neon64_mem.isa 79fde1c67ed8
> >
> > Diff: http://reviews.gem5.org/r/2338/diff/
> >
> >
> > Testing
> > -------
> >
> >
> > Thanks,
> >
> > Andreas Hansson
> >
> >
>
> _______________________________________________
> gem5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/gem5-dev
>
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to