Hello All,
I am using gem5 to perform some research on ARM/NEON performance. In particular
I'm looking into instruction timing and studying the ARM Cortex-A9 ( armv7 with
NEON).
Can you help me to clarify the following questions?
1.- ARM's documentation (Cortex™-A9 NEON™ Media ProcessingEngine) provides
instruction timing tables for VFP and NEON instructions. According to those
tables VFP and NEON instructions have different timing values, for example the
VFP vadd instruction takes 4 cycles and the NEON vadd instruction takes 6
cycles:
Table 3-2 VFP instruction timing
Name Format Cycles Source Result Writeback
VADD Dd,Dn,Dm 1 -1,1 4 4
Table 3-4 Advanced SIMD integer arithmetic instruction timing
Name Format Cycles Source Result Writeback
VADD Dd,Dn,Dm 1 -2,2 3 6
However, the O3_ARMv7a.py file that defines the characteristics of the armv7
architecture shows the same operation latency for VFP and NEON instructions:
# Floating point and SIMD instructions
class O3_ARM_v7a_FP(FUDesc):
opList = [ OpDesc(opClass='SimdAdd', opLat=4),
Is this correct? shouldn't NEON instructions have a opLat value of 6? Do I need
to change the latency (from 4 to 6 in the case of the ADD instruction) to
correctly simulate the latency of a NEON instruction as specified by the ARM
documentation? Is the simulator aware that a VFP instruction may have a
different latency that a NEON instruction?
2.- In the same document, besides Writeback, other instruction timing values
are defined (in section 3.4.1 Instruction timing tables ): Result (result
ready), Source (operands available), Cycle (issue cycles). Does the value
"opLat" in the O3_ARMv7a.py file is the defined as the writeback value or as
the result value? Note that the result and the writeback values are not the
same. Are the other timing values (Source, cycle) taken into account by gem5?
Best Regards,
Raul
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev