On Fri, 8 May 2015 09:44:21 -0700 Tony Luck <[email protected]> wrote:
> Some high end Intel Xeon systems report uncorrectable memory errors > as a recoverable machine check. Linux has included code for some time > to process these and just signal the affected processes (or even > recover completely if the error was in a read only page that can be > replaced by reading from disk). > > But we have no recovery path for errors encountered during kernel > code execution. Except for some very specific cases were are unlikely > to ever be able to recover. > > Enter memory mirroring. Actually 3rd generation of memory mirroing. > > Gen1: All memory is mirrored > Pro: No s/w enabling - h/w just gets good data from other side of the > mirror > Con: Halves effective memory capacity available to OS/applications > Gen2: Partial memory mirror - just mirror memory begind some memory > controllers > Pro: Keep more of the capacity > Con: Nightmare to enable. Have to choose between allocating from > mirrored memory for safety vs. NUMA local memory for performance > Gen3: Address range partial memory mirror - some mirror on each memory > controller > Pro: Can tune the amount of mirror and keep NUMA performance > Con: I have to write memory management code to implement > > The current plan is just to use mirrored memory for kernel allocations. This > has been broken into two phases: > 1) This patch series - find the mirrored memory, use it for boot time > allocations > 2) Wade into mm/page_alloc.c and define a ZONE_MIRROR to pick up the unused > mirrored memory from mm/memblock.c and only give it out to select kernel > allocations (this is still being scoped because page_alloc.c is scary). Looks good to me. What happens to these patches while ZONE_MIRROR is being worked on? I'm wondering about phase II. What does "select kernel allocations" mean? I assume we can't say "all kernel allocations" because that can sometimes be "almost all memory". How are you planning on implementing this? A new __GFP_foo flag, then sprinkle that into selected sites? Will surplus ZONE_MIRROR memory be available for regular old movable allocations? I suggest you run the design ideas by Mel before getting into implementation. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

