Re: [gem5-dev] Review Request: X86: haddps: Another patch from Vince Weaver

Marc Orr Fri, 11 May 2012 17:29:42 -0700


> On April 3, 2011, 5:27 p.m., Gabe Black wrote:
> > src/arch/x86/isa/insts/simd128/floating_point/arithmetic/horizontal_addition.py,
> >  line 41
> > <http://reviews.gem5.org/r/592/diff/1/?file=11018#file11018line41>
> >
> >     ext is no longer set to a raw bitvector that selects per instruction 
> > features like this since, as you can see, it's pretty opaque just looking 
> > at it. The maddf ext=1 becomes ext=Scalar. For msrli and mslli, ext=0 is 
> > the default and can be dropped. It would leave the ops as SIMD. Since 
> > they're already operating at the full width of the fp register type (a 
> > double) the value is especially redundant.


I've address this in a newly submitted patch (since I can't update this one).


> On April 3, 2011, 5:27 p.m., Gabe Black wrote:
> > src/arch/x86/isa/insts/simd128/floating_point/arithmetic/horizontal_addition.py,
> >  line 55
> > <http://reviews.gem5.org/r/592/diff/1/?file=11018#file11018line55>
> >
> >     This implementation is a bit inefficient, although not terribly so. You 
> > have to be careful since the two operands may be the same registers and you 
> > don't want to overwrite something you still need, but, for instance, the 
> > maddf one line above, this shift of ufp4 and the maddf on line 60 could all 
> > update xmmh since all "high" halves of xmm registers have been read and no 
> > faults can happen. The moves that read out xmmlm could be moved higher, and 
> > xmml could also be updated directly.
> >     
> >     I think it -may- also be possible to do something clever and cut down 
> > the number of microops shifting things around to pack and unpack the 
> > results. I may have also suspected this was true when I wrote the much 
> > simpler 64 bit wide version of this instruction below this one where the 
> > components are whole registers and can be indexed directly, but then didn't 
> > come up with anything and punted for later.

In a new patch (since I can't update this one), I removed a few redundant loads 
from each macroop. I didn't quite achieve the optimization that you are 
suggesting though. 


> On April 3, 2011, 5:27 p.m., Gabe Black wrote:
> > src/arch/x86/isa/insts/simd128/floating_point/arithmetic/horizontal_addition.py,
> >  line 78
> > <http://reviews.gem5.org/r/592/diff/1/?file=11018#file11018line78>
> >
> >     This microop is changing architecturally visible state and effectively 
> > committing to completing the op before all the possibly faulting ops have 
> > executed, specifically the following loads. There are 8 microcode fp 
> > registers so you can just use the others and leave ufp3 around until the 
> > end.

I've address this in a newly submitted patch (since I can't update this one).


> On April 3, 2011, 5:27 p.m., Gabe Black wrote:
> > src/arch/x86/isa/insts/simd128/floating_point/arithmetic/horizontal_addition.py,
> >  line 108
> > <http://reviews.gem5.org/r/592/diff/1/?file=11018#file11018line108>
> >
> >     Like above, this can't happen before the loads.

I've address this in a newly submitted patch (since I can't update this one).


- Marc


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/592/#review1096
-----------------------------------------------------------


On March 17, 2011, 4:07 p.m., Lisa Hsu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/592/
> -----------------------------------------------------------
> 
> (Updated March 17, 2011, 4:07 p.m.)
> 
> 
> Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and 
> Nathan Binkert.
> 
> 
> Description
> -------
> 
> X86:  haddps: Another patch from Vince Weaver
> 
> 
> Diffs
> -----
> 
>   src/arch/x86/isa/decoder/two_byte_opcodes.isa 2e269d6fb3e6 
>   
> src/arch/x86/isa/insts/simd128/floating_point/arithmetic/horizontal_addition.py
>  2e269d6fb3e6 
> 
> Diff: http://reviews.gem5.org/r/592/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Lisa Hsu
> 
>

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Re: [gem5-dev] Review Request: X86: haddps: Another patch from Vince Weaver

Reply via email to