Luck, Tony wrote:
 >>>   ld4.acq r28 = [r29]     // xtime_lock.sequence. Must come first for 
locking purposes
 >>> + ;;
 >>>  (p8)     mov r2 = ar.itc         // CPU_TIMER. 36 clocks latency!!!

The .acq only causes ordering w.r.t. data accesses.  The read from ar.itc
isn't a data access, so potentially it could still float before the
ld4.acq.  Consuming the value loaded into r28 presumably has to
ensure that the load completes though.

Hmm, then will this problem not happen if timesource was not ar.itc?
If source is mmio, the read from the address is a data access, isn't it?

I'm guessing here ... I haven't cross-checked with the architects.

I'll be grad if we can get a comment from Intel's architects.

Does moving the "and r28 = ~1,r28" up into this slot hurt latency
for a single call to gettimeofday()?  Presumably it will if
xtime_lock.sequence is not in the cache.

-Tony

It will, I guess.
Anyway, we should make sure that the load of xtime_lock.sequence have
complete before reading ar.itc.

Thanks,
H.Seto

-
To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to