On Tue, 17 Jan 2017, Borislav Petkov wrote: > From: Borislav Petkov <[email protected]> > > The idea was to not scan the microcode blob on each AP (Application > Processor) during boot and thus save us some milliseconds. However, on > architectures where the microcode engine is shared between threads, this > doesn't work. Here's why: > > The microcode on CPU0, i.e., the first thread, gets updated. The second > thread, i.e., CPU1, i.e., the first AP walks into load_ucode_amd_ap(), > sees that there's no container cached and goes and scans for the proper > blob. > > It finds it and as a last step of apply_microcode_early_amd(), it tries > to apply the patch but that core has already the updated microcode > revision which it has received through CPU0's update. So it returns > false and we do desc->size = -1 to prevent other APs from scanning. > > However, the next AP, CPU2, has a different microcode engine which > hasn't been updated yet. The desc->size == -1 test prevents it from > scanning the blob anew and we fail to update it.
Well, that could be solved by a proper state member in the global container descriptor. But your solution is better in the end. > The fix is much more straight-forward than it looks: the BSP > (BootStrapping Processor), i.e., CPU0, caches the microcode patch > in amd_ucode_patch. We use that on the AP and try to apply it. > In the 99.9999% of cases where we have homogeneous cores - *not* > mixed-steppings - the application will be successful and we're good to > go. > > In the remaining small set of systems, we will simply rescan the blob > and find (or not, if none present) the proper patch and apply it then. Makes sense, but how does such a system handle the suspend/resume case when the micro code is in the initrd? Are you caching the per cpu patches somewhere? Reviewed-by: Thomas Gleixner <[email protected]>

