On Wed, 2015-06-10 at 17:12 +0200, Thomas Gleixner wrote: > On Wed, 10 Jun 2015, Mike Galbraith wrote: > > The above was handed to me by a colleague working on a Xen guest that > > livelocked. I at first though Xen arch must have a weird problem, but > > when I tried proggy on my desktop box, while it didn't stop the tick > > completely as it did the Xen box, it slowed it to a crawl. I noticed > > that this did not happen with newer kernels, so a bisecting I did go, > > and found that... > > > > 279f14614 x86: apic: Use tsc deadline for oneshot when available > > > > ..is what fixed it up. Trouble is, while it fixes up my Haswell box, a > > This does not make any sense at all. It does not matter whether the > box uses tscdeadline or local apic timer. We do not even program the > hardware because we see that the event is in the past already.
Yup. > So we raise the hrtimer softirqd, which then expires the timer. So all > what happens is that ksoftirqd accumulates runtime, but I cannot at > all see how that amounts to a DoS and brings the machine to a grinding > halt. The tick certainly appears to crawl here, and Dom0 boxen gripe if you let them not tick at all for a while. > I just booted a SNB with lapic=notscdeadline and ran that test > program. All what happens is - as expected - that ksoftirqd runs more > than we would like it to. I cannot observe any anomality vs. local > timer interrupts at all. If I run this pinned on an otherwise idle > core, then I get ~ CONFIG_HZ interrupts per second, which is what you > expect when the cpu never reaches idle. Hm. In order to successfully bisect the thing 3.7->3.8 I ran 2xCPUS copies because the first bisect went gaga. I'm not having any trouble reproducing on master with a single pinned copy though, nor did I have any on any of the kernels either stable or enterprise I tested, and that's quite a few. Whatever, that first bisect did go bad. > > The below targets the symptom, consider it hrtimer cluebat attractant. > > By now I know to take your patches with a grain of salt :) Sodium being bad for blood pressure is a medical myth. > Some more information about your symptoms in form of configuration, > extra patches, kernel traces etc. would be appreciated. Virgin source or kernels with zillion+ patches, doesn't matter. To test virgin source earlier than EFI_STUB I had to pollute the source with EFI backports, but nothing else. Just a sec while I check yet again that absolutely virgin master really really does stall.... Yup. I pinned the tescase to CPU3.. while sleep 1; do grep LOC /proc/interrupts; done LOC: 6706 5367 5053 6217 3031 2866 5477 3022 Local timer interrupts LOC: 6753 5391 5074 6238 3058 2894 5576 3034 Local timer interrupts LOC: 6791 5422 5104 6265 3066 2903 5582 3039 Local timer interrupts LOC: 6846 5472 5154 6293 3096 2909 5595 3042 Local timer interrupts LOC: 6855 5518 5177 6325 3199 2920 5613 3046 Local timer interrupts LOC: 6892 5552 5217 6338 3234 2935 5637 3053 Local timer interrupts LOC: 6983 5568 5236 6347 3244 2944 5660 3065 Local timer interrupts LOC: 7028 5583 5251 6363 3262 2963 5673 3071 Local timer interrupts LOC: 7217 5676 5343 6383 3305 2976 5682 3078 Local timer interrupts LOC: 7432 5803 5418 6387 3371 3039 5757 3080 Local timer interrupts <== here LOC: 7560 6028 5632 6394 3538 3195 5937 3084 Local timer interrupts LOC: 7747 6135 5720 6394 3543 3262 6087 3086 Local timer interrupts LOC: 7930 6206 5785 6394 3571 3288 6303 3087 Local timer interrupts LOC: 8057 6299 5842 6394 3606 3346 6415 3088 Local timer interrupts LOC: 8236 6361 5921 6394 3632 3409 6630 3090 Local timer interrupts LOC: 8382 6448 6004 6394 3664 3478 6754 3090 Local timer interrupts LOC: 8460 6571 6124 6394 3690 3542 6951 3092 Local timer interrupts LOC: 8605 6670 6224 6394 3723 3614 7078 3093 Local timer interrupts LOC: 8710 6842 6323 6394 3776 3702 7295 3123 Local timer interrupts LOC: 8868 6947 6402 6394 3828 3784 7422 3149 Local timer interrupts LOC: 9077 7124 6523 6394 3901 3848 7637 3172 Local timer interrupts LOC: 9222 7189 6596 6394 3971 3928 7763 3174 Local timer interrupts LOC: 9336 7325 6699 6394 4020 3948 7912 3176 Local timer interrupts LOC: 9423 7414 6849 6395 4089 3979 7940 3177 Local timer interrupts LOC: 9637 7595 6923 6395 4111 4039 7942 3179 Local timer interrupts LOC: 9807 7734 7095 6395 4232 4108 8069 3180 Local timer interrupts ^C Config attached. -Mike
config.xz
Description: application/xz