> On May 13, 2012, 7:25 p.m., Gabe Black wrote:
> > Thanks for stepping up and taking a shot at this. I saw some possible 
> > improvements to your implementation and, after play with it a bit, came up 
> > with this:
> > 
> > 
> >     shuffle ufp1, xmml, xmmh, ext=((0 << 0) | (2 << 2)), size=4
> >     shuffle ufp2, xmml, xmmh, ext=((1 << 0) | (3 << 2)), size=4
> >     shuffle ufp3, xmmlm, xmmhm, ext=((0 << 0) | (2 << 2)), size=4
> >     shuffle ufp4, xmmlm, xmmhm, ext=((1 << 0) | (3 << 2)), size=4
> > 
> >     maddf xmml, ufp1, ufp2, size=4
> >     maddf xmmh, ufp3, ufp4, size=4
> > 
> > 
> > The memory versions follow naturally. It works/should work by moving the 
> > input values to the position they'll be in the answer with the "shuffle" 
> > microop, and then adding them together in parallel. I've verified that this 
> > compiles but haven't functionally tested it. Could you please use your test 
> > program to do that?
> > 
> > Also, the HADDPS_XMM_P version is basically the same as HADDPS_XMM_M, it 
> > just uses RIP relative addressing for the memory operand. The microcode for 
> > those typically read the RIP into microcode register t7 and then use the 
> > riprel address computation shorthand but are otherwise the same as the 
> > normal memory version. That addressing mode is only available in 64 bit 
> > mode, and to make sure you're using the version you want (RIP relative 
> > versus regular) you may have to encode the instruction manually.
> 
> Gabe Black wrote:
>     And actually, if you wanted to write up little test programs for other 
> instructions we're missing or even go as far as putting them into a new 
> regression, that would be great. It would be a great tool for you (or anyone 
> else) to get and keep those instructions working, and it would be nice to 
> fill in some of the gaps in our x86 instruction support.

In the near term, I could certainly put the test program that I have for haddps 
somewhere in the repo or make it into a regression if that would be useful.


- Marc


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/1189/#review2685
-----------------------------------------------------------


On May 14, 2012, 6:27 p.m., Marc Orr wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/1189/
> -----------------------------------------------------------
> 
> (Updated May 14, 2012, 6:27 p.m.)
> 
> 
> Review request for Default.
> 
> 
> Description
> -------
> 
> Changeset 8981:463aba906774
> ---------------------------
> x86 ISA: Implement the sse3 haddps instruction.
> 
> This patch is a revised version of Vince Weaver's  patch from 592.
> 
> 
> Diffs
> -----
> 
>   src/arch/x86/isa/decoder/two_byte_opcodes.isa 
> 4388495beb44ba859d20177371caf9e14902ef91 
>   
> src/arch/x86/isa/insts/simd128/floating_point/arithmetic/horizontal_addition.py
>  4388495beb44ba859d20177371caf9e14902ef91 
> 
> Diff: http://reviews.gem5.org/r/1189/diff/
> 
> 
> Testing
> -------
> 
> Wrote a little program that uses haddps. All 3 haddps versions were tested 
> (XMM_XMM, XMM_M, and XMM_P).
> 
> 
> Thanks,
> 
> Marc Orr
> 
>

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to