Re: [GIT PULL] LIBNVDIMM update for v5.18

2022-03-30 Thread pr-tracker-bot
The pull request you sent on Tue, 29 Mar 2022 13:54:41 -0700:

> git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm 
> tags/libnvdimm-for-5.18

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/ee96dd9614f1c139e719dd2f296acbed7f1ab4b8

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html



Re: [PATCH v11 1/8] dax: Introduce holder for dax_device

2022-03-30 Thread Darrick J. Wong
On Wed, Mar 30, 2022 at 08:49:29AM -0700, Christoph Hellwig wrote:
> On Wed, Mar 30, 2022 at 06:58:21PM +0800, Shiyang Ruan wrote:
> > As the code I pasted before, pmem driver will subtract its ->data_offset,
> > which is byte-based. And the filesystem who implements ->notify_failure()
> > will calculate the offset in unit of byte again.
> > 
> > So, leave its function signature byte-based, to avoid repeated conversions.
> 
> I'm actually fine either way, so I'll wait for Dan to comment.

FWIW I'd convinced myself that the reason for using byte units is to
make it possible to reduce the pmem failure blast radius to subpage
units... but then I've also been distracted for months. :/

--D



Re: [PATCH v11 7/8] xfs: Implement ->notify_failure() for XFS

2022-03-30 Thread Christoph Hellwig
On Wed, Mar 30, 2022 at 11:16:10PM +0800, Shiyang Ruan wrote:
> > > +#if IS_ENABLED(CONFIG_MEMORY_FAILURE) && IS_ENABLED(CONFIG_FS_DAX)
> > 
> > No real need for the IS_ENABLED.  Also any reason to even build this
> > file if the options are not set?  It seems like
> > xfs_dax_holder_operations should just be defined to NULL and the
> > whole file not supported if we can't support the functionality.
> 
> Got it.  These two CONFIG seem not related for now.  So, I think I should
> wrap these code with #ifdef CONFIG_MEMORY_FAILURE here, and add
> `xfs-$(CONFIG_FS_DAX) += xfs_notify_failure.o` in the makefile.

I'd do

ifeq ($(CONFIG_MEMORY_FAILURE),y)
xfs-$(CONFIG_FS_DAX) += xfs_notify_failure.o
endif

in the makefile and keep it out of the actual source file entirely.

> > > +
> > > + /* Ignore the range out of filesystem area */
> > > + if ((offset + len) < ddev_start)
> > 
> > No need for the inner braces.
> > 
> > > + if ((offset + len) > ddev_end)
> > 
> > No need for the braces either.
> 
> Really no need?  It is to make sure the range to be handled won't out of the
> filesystem area.  And make sure the @offset and @len are valid and correct
> after subtract the bbdev_start.

Yes, but there is no need for the braces per the precedence rules in
C.  So these could be:

if (offset + len < ddev_start)

and

if (offset + len > ddev_end)



Re: [PATCH v11 1/8] dax: Introduce holder for dax_device

2022-03-30 Thread Christoph Hellwig
On Wed, Mar 30, 2022 at 06:58:21PM +0800, Shiyang Ruan wrote:
> As the code I pasted before, pmem driver will subtract its ->data_offset,
> which is byte-based. And the filesystem who implements ->notify_failure()
> will calculate the offset in unit of byte again.
> 
> So, leave its function signature byte-based, to avoid repeated conversions.

I'm actually fine either way, so I'll wait for Dan to comment.



Re: [PATCH v11 7/8] xfs: Implement ->notify_failure() for XFS

2022-03-30 Thread Shiyang Ruan




在 2022/3/30 14:00, Christoph Hellwig 写道:

@@ -1892,6 +1893,8 @@ xfs_free_buftarg(
list_lru_destroy(&btp->bt_lru);
  
  	blkdev_issue_flush(btp->bt_bdev);

+   if (btp->bt_daxdev)
+   dax_unregister_holder(btp->bt_daxdev, btp->bt_mount);
fs_put_dax(btp->bt_daxdev);
  
  	kmem_free(btp);

@@ -1939,6 +1942,7 @@ xfs_alloc_buftarg(
struct block_device *bdev)
  {
xfs_buftarg_t   *btp;
+   int error;
  
  	btp = kmem_zalloc(sizeof(*btp), KM_NOFS);
  
@@ -1946,6 +1950,14 @@ xfs_alloc_buftarg(

btp->bt_dev =  bdev->bd_dev;
btp->bt_bdev = bdev;
btp->bt_daxdev = fs_dax_get_by_bdev(bdev, &btp->bt_dax_part_off);
+   if (btp->bt_daxdev) {
+   error = dax_register_holder(btp->bt_daxdev, mp,
+   &xfs_dax_holder_operations);
+   if (error) {
+   xfs_err(mp, "DAX device already in use?!");
+   goto error_free;
+   }
+   }


It seems to me that just passing the holder and holder ops to
fs_dax_get_by_bdev and the holder to dax_unregister_holder would
significantly simply the interface here.

Dan, what do you think?


+#if IS_ENABLED(CONFIG_MEMORY_FAILURE) && IS_ENABLED(CONFIG_FS_DAX)


No real need for the IS_ENABLED.  Also any reason to even build this
file if the options are not set?  It seems like
xfs_dax_holder_operations should just be defined to NULL and the
whole file not supported if we can't support the functionality.


Got it.  These two CONFIG seem not related for now.  So, I think I 
should wrap these code with #ifdef CONFIG_MEMORY_FAILURE here, and add 
`xfs-$(CONFIG_FS_DAX) += xfs_notify_failure.o` in the makefile.




Dan: not for this series, but is there any reason not to require
MEMORY_FAILURE for DAX to start with?


+
+   ddev_start = mp->m_ddev_targp->bt_dax_part_off;
+   ddev_end = ddev_start +
+   (mp->m_ddev_targp->bt_bdev->bd_nr_sectors << SECTOR_SHIFT) - 1;


This should use bdev_nr_bytes.


OK.



But didn't we say we don't want to support notifications on partitioned
devices and thus don't actually need all this?


+
+   /* Ignore the range out of filesystem area */
+   if ((offset + len) < ddev_start)


No need for the inner braces.


+   if ((offset + len) > ddev_end)


No need for the braces either.


Really no need?  It is to make sure the range to be handled won't out of 
the filesystem area.  And make sure the @offset and @len are valid and 
correct after subtract the bbdev_start.





diff --git a/fs/xfs/xfs_notify_failure.h b/fs/xfs/xfs_notify_failure.h
new file mode 100644
index ..76187b9620f9
--- /dev/null
+++ b/fs/xfs/xfs_notify_failure.h
@@ -0,0 +1,10 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2022 Fujitsu.  All Rights Reserved.
+ */
+#ifndef __XFS_NOTIFY_FAILURE_H__
+#define __XFS_NOTIFY_FAILURE_H__
+
+extern const struct dax_holder_operations xfs_dax_holder_operations;
+
+#endif  /* __XFS_NOTIFY_FAILURE_H__ */


Dowe really need a new header for this vs just sequeezing it into
xfs_super.h or something like that?


Yes, I'll move it into xfs_super.h.  The xfs_notify_failure.c was 
splitted from xfs_super.c in the old patch.  There is no need to create 
a header file for only single line of code.





diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index e8f37bdc8354..b8de6ed2c888 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -353,6 +353,12 @@ xfs_setup_dax_always(
return -EINVAL;
}
  
+	if (xfs_has_reflink(mp) && !xfs_has_rmapbt(mp)) {

+   xfs_alert(mp,
+   "need rmapbt when both DAX and reflink enabled.");
+   return -EINVAL;
+   }


Right now we can't even enable reflink with DAX yet, so adding this
here seems premature - it should go into the patch allowing DAX+reflink.



Yes.  I'll remove it for now.


--
Thanks,
Ruan.





Re: [PATCH v11 1/8] dax: Introduce holder for dax_device

2022-03-30 Thread Shiyang Ruan




在 2022/3/30 18:13, Christoph Hellwig 写道:

On Wed, Mar 30, 2022 at 06:03:01PM +0800, Shiyang Ruan wrote:


Because I am not sure if the offset between each layer is page aligned.  For
example, when pmem dirver handles ->memory_failure(), it should subtract its
->data_offset when it calls dax_holder_notify_failure().


If they aren't, none of the DAX machinery would work.


OK. Got it.

So, use page-based function signature for ->memory_failure():

int (*memory_failure)(struct dev_pagemap *pgmap, unsigned long pfn,
  unsigned long nr_pfns, int flags);


As the code I pasted before, pmem driver will subtract its 
->data_offset, which is byte-based. And the filesystem who implements 
->notify_failure() will calculate the offset in unit of byte again.


So, leave its function signature byte-based, to avoid repeated conversions.

int (*notify_failure)(struct dax_device *dax_dev, u64 offset,
  u64 len, int mf_flags);

What do you think?


--
Thanks,
Ruan.





Re: [PATCH v11 1/8] dax: Introduce holder for dax_device

2022-03-30 Thread Christoph Hellwig
On Wed, Mar 30, 2022 at 06:03:01PM +0800, Shiyang Ruan wrote:
> 
> Because I am not sure if the offset between each layer is page aligned.  For
> example, when pmem dirver handles ->memory_failure(), it should subtract its
> ->data_offset when it calls dax_holder_notify_failure().

If they aren't, none of the DAX machinery would work.



Re: [PATCH v11 1/8] dax: Introduce holder for dax_device

2022-03-30 Thread Shiyang Ruan




在 2022/3/30 13:41, Christoph Hellwig 写道:

On Wed, Mar 16, 2022 at 09:46:07PM +0800, Shiyang Ruan wrote:

Forgive me if this has been discussed before, but since dax_operations
are in terms of pgoff and nr pages and memory_failure() is in terms of
pfns what was the rationale for making the function signature byte
based?


Maybe I didn't describe it clearly...  The @offset and @len here are
byte-based.  And so is ->memory_failure().


Yes, but is there a good reason for that when the rest of the DAX code
tends to work in page chunks?


Because I am not sure if the offset between each layer is page aligned. 
 For example, when pmem dirver handles ->memory_failure(), it should 
subtract its ->data_offset when it calls dax_holder_notify_failure().


The implementation of ->memory_failure() by pmem driver:
+static int pmem_pagemap_memory_failure(struct dev_pagemap *pgmap,
+   phys_addr_t addr, u64 len, int mf_flags)
+{
+   struct pmem_device *pmem =
+   container_of(pgmap, struct pmem_device, pgmap);
+   u64 offset = addr - pmem->phys_addr - pmem->data_offset;
+
+   return dax_holder_notify_failure(pmem->dax_dev, offset, len, mf_flags);
+}

So, I choose u64 as the type of @len.  And for consistency, the @addr is 
using byte-based type as well.


> memory_failure()
> |* fsdax case
> |
> |pgmap->ops->memory_failure()  => pmem_pgmap_memory_failure()
> | dax_holder_notify_failure()  =>

the offset from 'pmem driver' to 'dax holder'

> |  dax_device->holder_ops->notify_failure() =>
> | - xfs_dax_notify_failure()
> |  |* xfs_dax_notify_failure()
> |  |--
> |  |   xfs_rmap_query_range()
> |  |xfs_dax_failure_fn()
> |  |* corrupted on metadata
> |  |   try to recover data, call xfs_force_shutdown()
> |  |* corrupted on file data
> |  |   try to recover data, call mf_dax_kill_procs()
> |* normal case
> |-
> |mf_generic_kill_procs()


--
Thanks,
Ruan.





Re: [PATCH v6 3/6] mm: rmap: introduce pfn_mkclean_range() to cleans PTEs

2022-03-30 Thread Christoph Hellwig
On Wed, Mar 30, 2022 at 03:31:37PM +0800, Muchun Song wrote:
> I saw Shiyang is ready to rebase onto this patch.  So should I
> move it to linux/mm.h or let Shiyang does?

Good question.  I think Andrew has this series in -mm and ready to go
to Linus, so maybe it is best if we don't change too much.

Andrew, can you just fold in the trivial comment fix?



Re: [PATCH v6 3/6] mm: rmap: introduce pfn_mkclean_range() to cleans PTEs

2022-03-30 Thread Muchun Song
On Wed, Mar 30, 2022 at 1:47 PM Christoph Hellwig  wrote:
>
> On Tue, Mar 29, 2022 at 09:48:50PM +0800, Muchun Song wrote:
> > + * * Return the start of user virtual address at the specific offset within
>
> Double "*" here.

Thanks for pointing out this.

>
> Also Shiyang has been wanting a quite similar vma_pgoff_address for use
> in dax.c.  Maybe we'll need to look into moving this to linux/mm.h.
>

I saw Shiyang is ready to rebase onto this patch.  So should I
move it to linux/mm.h or let Shiyang does?

Thanks.