Matt, That's awesome, thanks for sharing. For the Broadwell system, it seems to take 2-3 microseconds in C-state 1. By any chance do you have numbers for C-state 0? I am curious what an IPI costs for a non-idle cpu, too. These numbers are interesting, though, I didn't realize wake-ups cost so much (or to think about whether to send an IPI to an idle core in the first place).
If cpu #16 is the neighbor SMT thread for cpu #0, wouldn't IPI latencies be minimal? I assumed both SMT threads couldn't be in different C-states since they share resources. Unless I am reading the data incorrectly... -Alex On Wed, Aug 3, 2016 at 12:29 PM, Matthew Dillon <dil...@backplane.com> wrote: > I have some saved output from ./ipitest script runs that I've run on > various cpus: > > http://apollo.backplane.com/DFlyMisc/ipitest01.txt > > http://apollo.backplane.com/DFlyMisc/ipitest02.txt > > http://apollo.backplane.com/DFlyMisc/ipitest03.txt > > http://apollo.backplane.com/DFlyMisc/ipitest04.txt > > http://apollo.backplane.com/DFlyMisc/ipitest05.txt > > The numbers are in nanoseconds, so e.g. '2900' would be 2.9uS. '29000' > would be 29uS. The originating cpu is #0. As an example, taking > ipitest05.txt, due to the topological layout cpu #16 is the co-thread for > cpu #0 and one can see it in the numbers (since cpu #0 is initiating the > test, it's generally not running in a low-power mode regardless of the C > state we use). > > The results generally follow expectations. A cpu sitting in a deep C > state can take over 30uS to respond. Minimum IPI latency is usually in the > 1-2uS range. > > -Matt >