-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/1189/#review2685
-----------------------------------------------------------
Thanks for stepping up and taking a shot at this. I saw some possible
improvements to your implementation and, after play with it a bit, came up with
this:
shuffle ufp1, xmml, xmmh, ext=((0 << 0) | (2 << 2)), size=4
shuffle ufp2, xmml, xmmh, ext=((1 << 0) | (3 << 2)), size=4
shuffle ufp3, xmmlm, xmmhm, ext=((0 << 0) | (2 << 2)), size=4
shuffle ufp4, xmmlm, xmmhm, ext=((1 << 0) | (3 << 2)), size=4
maddf xmml, ufp1, ufp2, size=4
maddf xmmh, ufp3, ufp4, size=4
The memory versions follow naturally. It works/should work by moving the input
values to the position they'll be in the answer with the "shuffle" microop, and
then adding them together in parallel. I've verified that this compiles but
haven't functionally tested it. Could you please use your test program to do
that?
Also, the HADDPS_XMM_P version is basically the same as HADDPS_XMM_M, it just
uses RIP relative addressing for the memory operand. The microcode for those
typically read the RIP into microcode register t7 and then use the riprel
address computation shorthand but are otherwise the same as the normal memory
version. That addressing mode is only available in 64 bit mode, and to make
sure you're using the version you want (RIP relative versus regular) you may
have to encode the instruction manually.
- Gabe Black
On May 11, 2012, 5:19 p.m., Marc Orr wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/1189/
> -----------------------------------------------------------
>
> (Updated May 11, 2012, 5:19 p.m.)
>
>
> Review request for Default.
>
>
> Description
> -------
>
> Changeset 8981:bd580154c720
> ---------------------------
> x86 ISA: Implement the sse3 haddps instruction.
>
> This patch is a revised version of Vince Weaver's patch from 592.
>
>
> Diffs
> -----
>
> src/arch/x86/isa/decoder/two_byte_opcodes.isa
> 4388495beb44ba859d20177371caf9e14902ef91
>
> src/arch/x86/isa/insts/simd128/floating_point/arithmetic/horizontal_addition.py
> 4388495beb44ba859d20177371caf9e14902ef91
>
> Diff: http://reviews.gem5.org/r/1189/diff/
>
>
> Testing
> -------
>
> Wrote a little program that uses haddps. I was able to test both the XMM_XMM
> version and the XMM_M version. I don't understand what the XMM_P version is
> so I was not able to test it.
>
>
> Thanks,
>
> Marc Orr
>
>
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev