On Thu 19-01-17 14:17:19, Vishal Verma wrote:
> On 01/18, Jan Kara wrote:
> > On Tue 17-01-17 15:37:05, Vishal Verma wrote:
> > 2) PMEM is exposed for DAX aware filesystem. This seems to be what you are
> > mostly interested in. We could possibly do something more efficient than
> > what NVDIMM driver does however the complexity would be relatively high and
> > frankly I'm far from convinced this is really worth it. If there are so
> > many badblocks this would matter, the HW has IMHO bigger problems than
> > performance.
> 
> Correct, and Dave was of the opinion that once at least XFS has reverse
> mapping support (which it does now), adding badblocks information to
> that should not be a hard lift, and should be a better solution. I
> suppose should try to benchmark how much of a penalty the current badblock
> checking in the NVVDIMM driver imposes. The penalty is not because there
> may be a large number of badblocks, but just due to the fact that we
> have to do this check for every IO, in fact, every 'bvec' in a bio.

Well, letting filesystem know is certainly good from error reporting quality
POV. I guess I'll leave it upto XFS guys to tell whether they can be more
efficient in checking whether current IO overlaps with any of given bad
blocks.
 
> > Now my question: Why do we bother with badblocks at all? In cases 1) and 2)
> > if the platform can recover from MCE, we can just always access persistent
> > memory using memcpy_mcsafe(), if that fails, return -EIO. Actually that
> > seems to already happen so we just need to make sure all places handle
> > returned errors properly (e.g. fs/dax.c does not seem to) and we are done.
> > No need for bad blocks list at all, no slow down unless we hit a bad cell
> > and in that case who cares about performance when the data is gone...
> 
> Even when we have MCE recovery, we cannot do away with the badblocks
> list:
> 1. My understanding is that the hardware's ability to do MCE recovery is
> limited/best-effort, and is not guaranteed. There can be circumstances
> that cause a "Processor Context Corrupt" state, which is unrecoverable.

Well, then they have to work on improving the hardware. Because having HW
that just sometimes gets stuck instead of reporting bad storage is simply
not acceptable. And no matter how hard you try you cannot avoid MCEs from
OS when accessing persistent memory so OS just has no way to avoid that
risk.

> 2. We still need to maintain a badblocks list so that we know what
> blocks need to be cleared (via the ACPI method) on writes.

Well, why cannot we just do the write, see whether we got CMCI and if yes,
clear the error via the ACPI method?

                                                                Honza
-- 
Jan Kara <j...@suse.com>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to