-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/1189/#review2685
-----------------------------------------------------------


Thanks for stepping up and taking a shot at this. I saw some possible 
improvements to your implementation and, after play with it a bit, came up with 
this:


    shuffle ufp1, xmml, xmmh, ext=((0 << 0) | (2 << 2)), size=4
    shuffle ufp2, xmml, xmmh, ext=((1 << 0) | (3 << 2)), size=4
    shuffle ufp3, xmmlm, xmmhm, ext=((0 << 0) | (2 << 2)), size=4
    shuffle ufp4, xmmlm, xmmhm, ext=((1 << 0) | (3 << 2)), size=4

    maddf xmml, ufp1, ufp2, size=4
    maddf xmmh, ufp3, ufp4, size=4


The memory versions follow naturally. It works/should work by moving the input 
values to the position they'll be in the answer with the "shuffle" microop, and 
then adding them together in parallel. I've verified that this compiles but 
haven't functionally tested it. Could you please use your test program to do 
that?

Also, the HADDPS_XMM_P version is basically the same as HADDPS_XMM_M, it just 
uses RIP relative addressing for the memory operand. The microcode for those 
typically read the RIP into microcode register t7 and then use the riprel 
address computation shorthand but are otherwise the same as the normal memory 
version. That addressing mode is only available in 64 bit mode, and to make 
sure you're using the version you want (RIP relative versus regular) you may 
have to encode the instruction manually.

- Gabe Black


On May 11, 2012, 5:19 p.m., Marc Orr wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/1189/
> -----------------------------------------------------------
> 
> (Updated May 11, 2012, 5:19 p.m.)
> 
> 
> Review request for Default.
> 
> 
> Description
> -------
> 
> Changeset 8981:bd580154c720
> ---------------------------
> x86 ISA: Implement the sse3 haddps instruction.
> 
> This patch is a revised version of Vince Weaver's  patch from 592.
> 
> 
> Diffs
> -----
> 
>   src/arch/x86/isa/decoder/two_byte_opcodes.isa 
> 4388495beb44ba859d20177371caf9e14902ef91 
>   
> src/arch/x86/isa/insts/simd128/floating_point/arithmetic/horizontal_addition.py
>  4388495beb44ba859d20177371caf9e14902ef91 
> 
> Diff: http://reviews.gem5.org/r/1189/diff/
> 
> 
> Testing
> -------
> 
> Wrote a little program that uses haddps. I was able to test both the XMM_XMM 
> version and the XMM_M version. I don't understand what the XMM_P version is 
> so I was not able to test it.
> 
> 
> Thanks,
> 
> Marc Orr
> 
>

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to