> I am sure ppc couldn't race (at least unless up_read/up_write were excuted 
> from irq/softnet context and that never happens in 2.4.4pre3, see below ;). 

This is not actually using the rwsem code I wrote at the moment.

> And incidentally the above is what (I guess Richard) did on the alpha and
> that should really go into common code instead of having
> asm-i386/rwsem-xadd.h asm-alpha/rwsem.h etcc.etc... just implement
> atomic_inc_return using xadd in asm-i386/atomic.h, that's much better
> design IMHO. 

I disagree... you want such primitives to be as efficient as possible. The 
whole point of having asm/xxxx.h files is that you can stuff them full of 
dirty tricks specific to certain architectures.

> That can obviously be done for example with C code like this: 
>        count = atomic_inc_return(&sem->count); 
>        if (__builtin_expect(count == 0, 0)) 
>                slow_path() 
>
> The above is the perfect C implementation IMHO

But not so efficient since it _has_ to take a jump unless the compiler can 
emit code in alternative text sections when it seems appropriate. Plus, you 
have to test count's value. XADD sets EFLAGS based on the result in memory, 
something that allows all but one fastpath to be two instructions in length 
(the one that isn't is three).

But, yes, there should probably be two generic cases: one implemented with a 
spinlock in the rwsem struct (as I supply) and one implemented using 
atomic_add_return(). Note, however! atomic_add_return() is not necessarily 
implemented efficiently: if it involves a cmpxchg loop (as many seem to), 
then that is really quite inefficient, and may be better done as a spinlock 
anyway.

I've had a look at your implementation... It seems to hold the spinlocks for 
an awfully long time... specifically around the local variable initialisation 
in the 'failed' functions. Don't forget that the compiler can't reorder these 
because they're inside the spinlock. I would, if I were you, fold the 
'failed' functions into the main ones to avoid this problem.

Your rw_semaphore structure is also rather large: 46 bytes without debugging 
stuff (16 bytes apiece for the waitqueues and 12 bytes for the rest). 
Contrast that with mine: generic is 24 bytes and the i386-xadd optimised is 
20.

Admittedly, though, yours is extremely simple and easy to follow, but I don't 
think it's going to be very fast.

Of course, I still prefer mine... smaller, faster, more efficient:-)

David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to