On Wed, 30 Nov 2016, Borislav Petkov wrote:
> On Tue, Nov 29, 2016 at 02:59:01PM +0100, Thomas Gleixner wrote:
> > The issue is that you obvioulsy start with the assumption, that the machine
> > has this bug. As a consequence the machine is brute forced into tick
> > broadcast mode, which cannot be reverted when you clear that misfeature
> > after ACPI init. So in case of !NOHZ and !HIGHRES the periodic tick is
> > forced into broadcast mode, which is not what you want.
> > As far as I understood the whole magic, this C1E misfeature takes only
> > effect _after_ ACPI has been initialized. So instead of setting the bug in
> > early boot and therefor forcing the broadcast nonsense, we should only set
> > it when ACPI has actually detected it.
> Problem is, select_idle_routine() runs a lot earlier than acpi_init() so
> there's a window where we don't definitively know yet whether the box is
> actually going to enter C1E or not.
> [ I presume the reason why we have to do the proper detection after
> ACPI has been initialized is because the frickelware decides whether
> to do C1E entry or not and then sets those bits in the MSR (or not). ]
> If in that window we enter idle and we're on an affected machine and we
> *don't* switch to broadcast mode, we risk not waking up from C1E, i.e.,
> the main reason this fix was even done.
> So, if we "prematurely" switch to broadcast mode on the affected CPUs,
> we're ok, it will be detected properly later and we're in broadcast
> mode already.
Right, that's the safe bet. But I'm quite sure that the C1E crap only
starts to work _after_ ACPI initialization.
> Now, on those machines which are not affected and we clear
> X86_BUG_AMD_APIC_C1E because they don't enter C1E at all, I was thinking
> of maybe doing amd_e400_remove_cpu() and clearing that e400 mask and
> even freeing it so that they can do default_idle().
> But you're saying tick_broadcast_enter() is irreversible?
tick_force_broadcast() is irreversible