On 8/18/20 2:11 AM, Nicholas Piggin wrote> Very reasonable point.
The problem we're trying to get a handle on is live partition migration where a running guest might be using SAO then get migrated to a P10. I don't think we have a good way to handle this case. Potentially the hypervisor could revoke the page tables if the guest is running in hash mode and the guest kernel could be taught about that and sigbus the process, but in radix the guest controls those page tables and the SAO state and I don't think there's a way to cause it to take a fault. I also don't know what the proprietary hypervisor does here. We could add it back, default to n, or make it bare metal only, or somehow try to block live migration to a later CPU without the faciliy. I wouldn't be against that.
Admittedly I'm not too familiar with the specifics of live migration or guest memory management, but restoring the functionality and adding a way to prevent migration of SAO-using guests seems like a reasonable choice to me. Would this be done with help from the guest using some sort of infrastructure to signal to the hypervisor that SAO is in use, or entirely on the hypervisor by e.g. scanning the through the process table for SAO pages?
It would be very interesting to know how it performs in such a "real" situation. I don't know how well POWER9 has optimised it -- it's possible that it's not much better than putting lwsync after every load or store.
This is definitely worth investigating in depth. That said, even if the performance on P9 isn't super great, I think the feature could still be useful, since it would offer more granularity than the sledgehammer approach of emitting lwsync everywhere. I'd be happy to put in some of the work required to get this to a point where it can be reintroduced without breaking guest migration - I'd just need some pointers on getting started with whatever approach is decided on. Thanks, Shawn