> From: Dan Williams [mailto:dan.j.willi...@intel.com]
> Sent: Wednesday, May 16, 2018 10:49 AM
> On Tue, May 15, 2018 at 7:05 PM, Huaisheng HS1 Ye <ye...@lenovo.com> wrote:
> >> From: Matthew Wilcox [mailto:wi...@infradead.org]
> >> Sent: Wednesday, May 16, 2018 12:20 AM>
> >> > > > > Then there's the problem of reconnecting the page cache (which is
> >> > > > > pointed to by ephemeral data structures like inodes and dentries)
> >> > > > > to
> >> > > > > the new inodes.
> >> > > > Yes, it is not easy.
> >> > >
> >> > > Right ... and until we have that ability, there's no point in this
> >> > > patch.
> >> > We are focusing to realize this ability.
> >> But is it the right approach? So far we have (I think) two parallel
> >> activities. The first is for local storage, using DAX to store files
> >> directly on the pmem. The second is a physical block cache for network
> >> filesystems (both NAS and SAN). You seem to be wanting to supplant the
> >> second effort, but I think it's much harder to reconnect the logical cache
> >> (ie the page cache) than it is the physical cache (ie the block cache).
> > Dear Matthew,
> > Thanks for correcting my idea with cache line.
> > But I have questions about that, assuming NVDIMM works with pmem mode, even
> > we
> > used it as physical block cache, like dm-cache, there is potential risk with
> > this cache line issue, because NVDIMMs are bytes-address storage, right?
> No, there is no risk if the cache is designed properly. The pmem
> driver will not report that the I/O is complete until the entire
> payload of the data write has made it to persistent memory. The cache
> driver will not report that the write succeeded until the pmem driver
> completes the I/O. There is no risk to losing power while the pmem
> driver is operating because the cache will recover to it's last
> acknowledged stable state, i.e. it will roll back / undo the
> incomplete write.
> > If system crash happens, that means CPU doesn't have opportunity to flush
> > all dirty
> > data from cache lines to NVDIMM, during copying data pointed by
> > bio_vec.bv_page to
> > NVDIMM.
> > I know there is btt which is used to guarantee sector atomic with block
> > mode,
> > but for pmem mode that will likely cause mix of new and old data in one page
> > of NVDIMM.
> > Correct me if anything wrong.
> dm-cache is performing similar metadata management as the btt driver
> to ensure safe forward progress of the cache state relative to power
> loss or system-crash.
Thanks for your introduction, I've learned a lot from your comments.
I suppose that there should be implementations to protect data and metadata
both in NVDIMMs from system-crash or power loss.
Not only data but also metadata itself needs to be correct and integrated, so
kernel could have chance to recover data to target device after rebooting,
> > Another question, if we used NVDIMMs as physical block cache for network
> > filesystems,
> > Does industry have existing implementation to bypass Page Cache similarly
> > like DAX way,
> > that is to say, directly storing data to NVDIMMs from userspace, rather
> > than copying
> > data from kernel space memory to NVDIMMs.
> Any caching solution with associated metadata requires coordination
> with the kernel, so it is not possible for the kernel to stay
> completely out of the way. Especially when we're talking about a cache
> in front of the network there is not much room for DAX to offer
> improved performance because we need the kernel to takeover on all
> write-persist operations to update cache metadata.
> So, I'm still struggling to see why dm-cache is not a suitable
> solution for this case. It seems suitable if it is updated to allow
> direct dma-access to the pmem cache pages from the backing device
> storage / networking driver.