Re: [IA64] Reduce __clear_bit_unlock overhead

Zoltan Menyhart Fri, 19 Oct 2007 07:14:43 -0700

Christoph Lameter wrote:

On Fri, 19 Oct 2007, Zoltan Menyhart wrote:

You may want to avoid assembly magics:

static __inline__ void
__clear_bit_unlock(int const nr, volatile void * const addr)
{
      volatile __u32 * const m = (volatile __u32 *) addr + (nr >> 5);

      *m &= ~(1 << (nr & 0x1f));
}

GCC compiles volatile loads with ".acq" and stores with ".rel".



But gcc does not generate the .nta type of store.


Can you please tell me what is the advantage of ".nta " on the store?
I far as I can see in the I2 Microarch. Guide, Table 3-2 Processor
Cache Hints, ".nta " on stores means:
- L2 NRU bit is not updated
- No slot is allocated in L3

In order to be able to take advantage of an "st.nta", you have to
use "ld.nta" in __clear_bit_unlock(), and at the bit lock
acquisition, too.

I assume the critical region data protected by the lock is in the
same cache line as the bit lock itself, therefore all loads /
stores have to use ".nta " - that the GCC wont generate.
Should you do it by hand, you would not use the cache as it is
assumed to be used, therefore the cache itself becomes less
efficient.

Nick Piggin wrote:

Actually I personally would prefer to use a non-volatile pointer,
and do the assembly explicitly. However, that's not for me to
decide. Importantly, the load with acquire is not required and I
agree it should go. Thanks for noticing that.


Well, one of the primary requirements is to avoid people
misunderstanding the code. I can accept that using explicit,
special assembly instructions can help people to understand
the code.

Zoltan
-
To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [IA64] Reduce __clear_bit_unlock overhead

Reply via email to