On Apr 8, 2005, at 4:04 PM, Gabriel Paubert wrote: > On Fri, Apr 08, 2005 at 02:01:13PM -0500, Kumar Gala wrote: > > >Now that I read it carefully, I realize that I was wrong. But there > > >is still some room for optimization; the parameter that you don't > > >need is %3: simply replace lwzx %0,0,%3 by lwz %0,-4(%4). > > > > Doesn't help, realize that we are going to have "r3" with a pointer > to > > pte.? There is no way w/o an add to get to the next word for the > lwarx. > > I'd have to see the context. One less parameter to an asm block may > also make the compiler life easier.
The only thing we could do is make the 4 a constant param and change the lwarx to use it.. not sure if thats any better than what we are doing. > > > > >But I'm not sure that OOO cannot play tricks on you, what guarantees > > > that the lwz is done after lwarx? > > > > I'm assuming since its a single asm block, gcc is not allowed to > > reorder it. > > Not GCC, but the hardware. If loads can pass loads and lwarx has > more internal housekeeping overhead (obviously) than lwz. Especially > in the case of a processor with 2 LSU: > - lwarx issued to LSU1 > - lwz issued LSU2 in the same clock cycle > > I'm not sure at all that that you are guaranteed not to get > potentially stale data from the lwz on SMP. Loads are weekly > ordered in general wrt each other and lwarx is no exception > AFAIR. The fact that the two words are guaranteed to be in > the same cache line makes it extremely unlikely, but not > impossible. You are correct, I guess I really need an eieio in between the lwarx and lwzx - kumar