On Mon, 6 Jul 2015, Alexandru Dutu wrote:



On July 6, 2015, 12:42 p.m., Giacomo Gabrielli wrote:
These are my current thoughts about this patch:

1. My impression is that there is still not enough architectural support to 
understand if the new vector register type as it stands can address all the
different corner cases efficiently; I'd leave to the wider gem5 community 
decide where we want to draw that line...

2. Legacy SSE requires merging of upper lanes, while AVX does zeroing;
   also ARMv8 AArch64 scalar FP and NEON instructions perform zeroing.
   Assuming that destination vectors are always read is going to
   introduce unneded serialization for those ISA extensions if they
   are going to be ported to the new scheme, so I'd suggest to avoid
   to implicitly read on write.  Also for cases where merging is
   required, maybe something smarter should be done to avoid unneded
   serialization; without optimizations, any sequence of x86 FP scalar
   instructions could be significantly slow compared to real hw
   implementations.

Could you please detail a bit the merging issue for legacy SSE?



I am not sure what you are asking for exactly. I have two interpretations of your question:

1. SSE instructions work with 128-bit registers.  AVX instructions work
   with 128-bit, 256-bit and 512-bit registers.  Since the actual
   underlying set of registers is the same, we need to do something about
   the bits that are not part of the output.  For SSE instructions, bits
   128 to VLmax-1 are retained as before.  For AVX, the instructions that
   output only 128-bits, zero rest of the bits in the register.  For
   example, suppose we are doing 32-bit adds on two 128-bit register, but
   the underlying register is 256-bit.  So C[0..3] = A[0..3] + B[0..3] for
   both SSE and AVX.  But C[4..7] = C_old[4..7] for SSE and C[4..7] = 0
   for AVX.


2. In the implementation that I posted, we only maintain the largest size
   register that the ISA supports.  So, if the largest vector width is
   512-bits, then all vector registers are 512-bit wide.  While executing
   SSE instructions, we need to retain the previous data.  So while
   writing to the output register, we need perform a merge between the new
   and the old values.  This means we need to read the old values first.
   So there would be serialization between instructions that read and
   write different parts of the vector register..  But now that I think
   about it, most instructions are going to read / write the lower bits.
   So the serialization would occur anyway.

--
Nilay
_______________________________________________
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to