On Tue, Nov 29, 2016 at 02:59:01PM +0100, Thomas Gleixner wrote:
> The issue is that you obvioulsy start with the assumption, that the machine
> has this bug. As a consequence the machine is brute forced into tick
> broadcast mode, which cannot be reverted when you clear that misfeature
> after ACPI init. So in case of !NOHZ and !HIGHRES the periodic tick is
> forced into broadcast mode, which is not what you want.
> As far as I understood the whole magic, this C1E misfeature takes only
> effect _after_ ACPI has been initialized. So instead of setting the bug in
> early boot and therefor forcing the broadcast nonsense, we should only set
> it when ACPI has actually detected it.
Problem is, select_idle_routine() runs a lot earlier than acpi_init() so
there's a window where we don't definitively know yet whether the box is
actually going to enter C1E or not.
[ I presume the reason why we have to do the proper detection after
ACPI has been initialized is because the frickelware decides whether
to do C1E entry or not and then sets those bits in the MSR (or not). ]
If in that window we enter idle and we're on an affected machine and we
*don't* switch to broadcast mode, we risk not waking up from C1E, i.e.,
the main reason this fix was even done.
So, if we "prematurely" switch to broadcast mode on the affected CPUs,
we're ok, it will be detected properly later and we're in broadcast
Now, on those machines which are not affected and we clear
X86_BUG_AMD_APIC_C1E because they don't enter C1E at all, I was thinking
of maybe doing amd_e400_remove_cpu() and clearing that e400 mask and
even freeing it so that they can do default_idle().
But you're saying tick_broadcast_enter() is irreversible?
Good mailing practices for 400: avoid top-posting and trim the reply.