On Fri, Jan 26, 2024 at 08:49:03PM +0000, Luck, Tony wrote: > Every patch that adds new code or data structures adds to the kernel > memory footprint. Each should be considered on its merits. The basic > question being: > > "Is the new functionality worth the cost?" > > Where does it end? It would end if Linus declared: > > "Linux is now complete. Stop sending patches". > > I.e. it is never going to end.
No, it's not that - it is the merit thing which determines. > 1) PPIN > Cost = 8 bytes. > Benefit: Emdeds a system identifier into the trace record so there > can be no ambiguity about which machine generated this error. > Also definitively indicates which socket on a multi-socket system. > > 2) MICROCODE > Cost = 4 bytes > Benefit: Certainty about the microcode version active on the core > at the time the error was detected. > > RAS = Reliability, Availability, Serviceability > > These changes fall into the serviceability bucket. They make it > easier to diagnose what went wrong. So does dmesg. Let's add it to the tracepoint... But no, that's not the right question to ask. It is rather: which bits of information are very relevant to an error record and which are transient enough so that they cannot be gathered from a system by other means or only gathered in a difficult way, and should be part of that record. The PPIN is not transient but you have to go map ->extcpu to the PPIN so adding it to the tracepoint is purely a convenience thing. More or less. The microcode revision thing I still don't buy but it is already there so whateva... So we'd need a rule hammered out and put there in a prominent place so that it is clear what goes into struct mce and what not. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette