>>> ld4.acq r28 = [r29] // xtime_lock.sequence. Must come first for >>> locking purposes >>> + ;; >>> (p8) mov r2 = ar.itc // CPU_TIMER. 36 clocks latency!!!
The .acq only causes ordering w.r.t. data accesses. The read from ar.itc isn't a data access, so potentially it could still float before the ld4.acq. Consuming the value loaded into r28 presumably has to ensure that the load completes though. I'm guessing here ... I haven't cross-checked with the architects. Does moving the "and r28 = ~1,r28" up into this slot hurt latency for a single call to gettimeofday()? Presumably it will if xtime_lock.sequence is not in the cache. -Tony - To unsubscribe from this list: send the line "unsubscribe linux-ia64" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
