Hello, On Fri, Aug 10, 2012 at 04:13:03PM -0700, Andi Kleen wrote: > Naoya Horiguchi <n-horigu...@ah.jp.nec.com> writes: > > > Current error reporting of memory errors on dirty pagecache has silent > > data lost problem because AS_EIO in struct address_space is cleared > > once checked. > > Seems very complicated. I think I would prefer something simpler > if possible, especially unless it's proven the case is common. > It's hard to maintain rarely used error code when it's complicated.
I'm not sure if memory error is a rare event, because I don't have any numbers about that on real systems. But assuming that hwpoison events are not rare, dirty pagecache error is not an ignorable case because dirty page ratio is typically ~10% of total physical memory in average systems. It may be small but not negligible. > Maybe try Fengguang's simple proposal first? That would fix other IO > errors too. In my understanding, Fengguang's patch (specified in this patch's description) only fixes memory error reporting. And I'm not sure that similar appoarch (like making AS_EIO sticky) really fixes the IO errors because this change can break userspace applications which expect the current behavior. Anyway, OK, I agree to start with Fengguang's one and separate out the additional suggestion about "making dirty pagecache error recoverable". And if possible, I want your feedback about the additional part of my idea. Can I ask a favor? Thanks, Naoya -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/