On 11/10, Dan Williams wrote: > On Thu, Nov 10, 2016 at 11:17 AM, Vishal Verma <[email protected]> > wrote: > > On 11/10, Dave Jiang wrote: > >> We need to clear any poison when we are writing to pmem. The granularity > >> will be sector size. If it's less then we can't do anything about it > >> barring corruption. > >> > >> Signed-off-by: Dave Jiang <[email protected]> > >> --- > >> drivers/nvdimm/claim.c | 24 +++++++++++++++++++++--- > >> 1 file changed, 21 insertions(+), 3 deletions(-) > >> > >> diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c > >> index d5dc80c..306d8fd 100644 > >> --- a/drivers/nvdimm/claim.c > >> +++ b/drivers/nvdimm/claim.c > >> @@ -226,6 +226,11 @@ static int nsio_rw_bytes(struct nd_namespace_common > >> *ndns, > >> resource_size_t offset, void *buf, size_t size, int rw) > >> { > >> struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev); > >> + unsigned int sz_align = ALIGN(size + (offset & (512 - 1)), 512); > >> + sector_t sector = offset >> 9; > >> + > >> + if (unlikely(!size)) > >> + return -EINVAL; > >> > >> if (unlikely(offset + size > nsio->size)) { > >> dev_WARN_ONCE(&ndns->dev, 1, "request out of range\n"); > >> @@ -233,12 +238,25 @@ static int nsio_rw_bytes(struct nd_namespace_common > >> *ndns, > >> } > >> > >> if (rw == READ) { > >> - unsigned int sz_align = ALIGN(size + (offset & (512 - 1)), > >> 512); > >> - > >> - if (unlikely(is_bad_pmem(&nsio->bb, offset / 512, sz_align))) > >> + if (unlikely(is_bad_pmem(&nsio->bb, sector, sz_align))) > >> return -EIO; > >> return memcpy_from_pmem(buf, nsio->addr + offset, size); > >> } else { > >> + > >> + if (unlikely(is_bad_pmem(&nsio->bb, sector, sz_align))) { > >> + if (IS_ALIGNED(offset, 512) && IS_ALIGNED(size, > >> 512)) { > >> + long cleared; > >> + > >> + cleared = nvdimm_clear_poison(&ndns->dev, > >> + offset, size); > > > > Everything above looks good.. > > > >> + if (cleared != size) > >> + return -EIO; > > > > But I think we don't want to return -EIO here. Instead, ... > >> + > >> + badblocks_clear(&nsio->bb, sector, > >> + cleared >> 9); > >> + } > >> + } > >> + > >> memcpy_to_pmem(nsio->addr + offset, buf, size); > >> nvdimm_flush(to_nd_region(ndns->dev.parent)); > > > > Do _some_ writes. We have two options: > > 1. Try to write the whole thing (size), and return success. If the write > > touches any remaining poison, it will not complain, but the poison will > > be retained, so subsequent reads will hit it and fail. > > > > 2. Only write what we were able to clear, i.e. 'cleared' bytes. And > > since we didn't complete the write, error out. > > > > I think 1 makes more sense, but I could be convinced otherwise.. > > > > One wild idea might be: > > rw_bytes can write arbitrary lengths, and commonly does up to 4K at a > > time (BTT data writes). This gives us a rather large window where we > > could be clearing 4K worth of badblocks all together, and then writing > > 4K of data. If we crash anywhere in the middle, we're almost guaranteed > > silent data corruption. > > > > Instead, what if, for the known-errors case for writes, we break it down > > into smaller chunks: > > Go cache line by cache line - if it may have poison (based on > > badblocks), clear poison, followed by a cache line worth of data write. > > When we cross a sector boundary (512), do a badblocks_clear. > > > > I think this gives us the best chance against silent corruption in the > > absence of an atomic clear-error-and-write command. > > > > Thoughts? > > This feels like over-engineering a still not perfect solution to a > rare problem. Outside of atomic-write-and-clear we should just keep > the code best effort and simple.
Fair enough :) In that case I think the only change needed is to simply remove the cleared != size check, do badblocks_clear for "cleared >> 9" (as it is now), and then write "size", and return success. Sound good? _______________________________________________ Linux-nvdimm mailing list [email protected] https://lists.01.org/mailman/listinfo/linux-nvdimm
