Re: [PATCH] mm: Remove mm argument from mm_get_unmapped_area()

2024-05-06 Thread Christoph Hellwig
On Mon, May 06, 2024 at 09:07:47AM -0700, Rick Edgecombe wrote:
>   if (flags & MAP_FIXED) {
>   /* Ok, don't mess with it. */
> - return mm_get_unmapped_area(current->mm, NULL, orig_addr, len, 
> pgoff, flags);
> + return current_get_unmapped_area(NULL, orig_addr, len, pgoff, 
> flags);

The old name seems preferable because it's not as crazy long.  In fact
just get_unmapped_area would be even better, but that's already taken
by something else.

Can we maybe take a step back and sort out the mess of the various
_get_unmapped_area helpers?

e.g. mm_get_unmapped_area_vmflags just wraps
arch_get_unmapped_area_topdown_vmflags and
arch_get_unmapped_area_vmflags, and we might as well merge all three
by moving the MMF_TOPDOWN into two actual implementations?

And then just update all the implementations to always pass the
vm_flags instead of having separate implementations with our without
the flags.

And then make __get_unmapped_area static in mmap.c nad move the
get_unmapped_area wrappers there.  And eventually write some
documentation for the functions based on the learnings who actually
uses what..



Re: [PATCH] module: ban '.', '..' as module names, ban '/' in module names

2024-04-15 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 




Re: [PATCH] vduse: implement DMA sync callbacks

2024-02-23 Thread Christoph Hellwig
On Thu, Feb 22, 2024 at 03:29:10PM -0500, Michael S. Tsirkin wrote:
> In a sense ... but on the other hand, the "fake DMA" metaphor seems to
> work surprisingly well, like in this instance - internal bounce buffer
> looks a bit like non-coherent DMA.  A way to make this all prettier
> would I guess be to actually wrap all of DMA with virtio wrappers which
> would all go if () dma_... else vduse_...; or something to this end.  A
> lot of work for sure, and is it really worth it? if the only crazy
> driver is vduse I'd maybe rather keep the crazy hacks local there ...

Well, vduse is the only driver that does this hack - we had a few more
and we got rid of it.  It basically is the only thing preventing us
from doing direct calls into the iommu code and compile out dma_ops
entirely for non-Xen builds on the common architectures.

So yes, I'd really like to see it gone rather sooner than later.



Re: [PATCH] vduse: implement DMA sync callbacks

2024-02-20 Thread Christoph Hellwig
On Mon, Feb 19, 2024 at 06:06:06PM +0100, Maxime Coquelin wrote:
> Since commit 295525e29a5b ("virtio_net: merge dma
> operations when filling mergeable buffers"), VDUSE device
> require support for DMA's .sync_single_for_cpu() operation
> as the memory is non-coherent between the device and CPU
> because of the use of a bounce buffer.
> 
> This patch implements both .sync_single_for_cpu() and
> sync_single_for_device() callbacks, and also skip bounce
> buffer copies during DMA map and unmap operations if the
> DMA_ATTR_SKIP_CPU_SYNC attribute is set to avoid extra
> copies of the same buffer.

vduse really needs to get out of implementing fake DMA operations for
something that is not DMA.




Re: [PATCH v1] module.h: define __symbol_get_gpl() as a regular __symbol_get()

2024-02-01 Thread Christoph Hellwig
On Thu, Feb 01, 2024 at 12:29:46PM +0300, Andrew Kanner wrote:
> Of course not, no new users needed.
> 
> I haven't discussed it directly. I found the unused __symbol_get_gpl()
> myself, but during investigation of wether it was ever used somewhere
> found the old patch series suggested by Mauro Carvalho Chehab (in Cc).

Ah, ok.

> 
> Link: 
> https://lore.kernel.org/lkml/5f001015990a76c0da35a4c3cf08e457ec353ab2.1652113087.git.mche...@kernel.org/
> 
> The patch series is from 2022 and not merged. You can take [PATCH v6
> 1/4] which removes the unused symbol from the link.
> 
> Or I can resend v2 with my commit msg. But not sure about how it works
> in such a case - will adding Suggested-by tag (if no objections from
> Mauro) with the Link be ok?

Either is fine.  I actually have a patch removing it somewhere in an
unused tree as well :)



Re: [PATCH v1] module.h: define __symbol_get_gpl() as a regular __symbol_get()

2024-01-31 Thread Christoph Hellwig
On Wed, Jan 31, 2024 at 10:02:52PM +0300, Andrew Kanner wrote:
> Prototype for __symbol_get_gpl() was introduced in the initial git
> commit 1da177e4c3f4 ("Linux-2.6.12-rc2"), but was not used after that.
> 
> In commit 9011e49d54dc ("modules: only allow symbol_get of
> EXPORT_SYMBOL_GPL modules") Christoph Hellwig switched __symbol_get()
> to process GPL symbols only, most likely this is what
> __symbol_get_gpl() was designed to do.
> 
> We might either define __symbol_get_gpl() as __symbol_get() or remove
> it completely as suggested by Mauro Carvalho Chehab.

Just remove it, there is no need to keep unused funtionality around.

Btw, where did the discussion start?  I hope you're not trying to
add new symbol_get users?




Re: [PATCH 5/5] modules: only allow symbol_get of EXPORT_SYMBOL_GPL modules

2023-10-17 Thread Christoph Hellwig
On Wed, Oct 18, 2023 at 01:30:18AM +0100, David Woodhouse wrote:
> 
> But if we're going to tolerate the core kernel still exporting some
> stuff with EXPORT_SYMBOL, why isn't OK for a GPL-licensed module do to
> the same? Even an *in-tree* GPL-licensed module now can't export
> functionality with EXPORT_SYMBOL and have it used with symbol_get().

Anything using symbol_get is by intent very deeply internal for tightly
coupled modules working together, and thus not a non-GPL export.

In fact the current series is just a stepping stone.  Once some mess
in the kvm/vfio integration is fixed up we'll require a new explicit
EXPORT_SYMBOL variant as symbol_get wasn't ever intended to be used
on totally random symbols not exported for use by symbol_get.



Re: [PATCH] module: Add CONFIG_MODULE_LOAD_IN_SEQUENCE option

2023-10-11 Thread Christoph Hellwig
On Mon, Oct 09, 2023 at 10:26:35AM +0530, Joey Jiao wrote:
> When modprobe cmds are executed one by one, the final loaded modules
> are not in fixed sequence as expected.

And why does this matter?

If this is important enough to matter it should just be the default,
and have really good reason for that.  Doing something like this as
a config option does not make any sense.




Re: [PATCH] net: appletalk: remove cops support

2023-09-27 Thread Christoph Hellwig
On Wed, Sep 27, 2023 at 11:00:30AM +0200, Greg Kroah-Hartman wrote:
> The COPS Appletalk support is very old, never said to actually work
> properly, and the firmware code for the devices are under a very suspect
> license.  Remove it all to clear up the license issue, if it is still
> needed and actually used by anyone, we can add it back later once the
> license is cleared up.

Looks good:

Acked-by: Christoph Hellwig 



Re: SPDX: Appletalk FW license in the kernel

2023-09-26 Thread Christoph Hellwig
On Fri, Sep 15, 2023 at 09:39:05AM -0400, Prarit Bhargava wrote:
> To be clear, I am not asking for their removal, however, I do think a better
> license should be issued for these files.  The files were trivially modified
> in 2006. It could be that the code in question is now unused and it is just
> easier to remove them.
> 
> Is there anyone you know of that we could approach to determine a proper
> SPDX License for these files?

The code contains firmware that is downloaded to the device.  The proper
thing would be to convert them to separate binary files in the
linux-firmware packages.  But given that the driver has seen nothing
but tree wide cleanups since the dawn of git I suspect there is no
maintainer and probably no user left.  The best might be to indeed just
remove it and see if anyone screams, in which case we could bring it
back after doing the above.



[PATCH 11/19] fs: add new shutdown_sb and free_sb methods

2023-09-13 Thread Christoph Hellwig
Currently super_blocks are shut down using the ->kill_sb method, which
must call generic_shutdown_super, but allows the file system to
add extra work before or after the call to generic_shutdown_super.

File systems tend to get rather confused by this, so add an alternative
shutdown sequence where generic_shutdown_super is called by the core
code, and there are extra ->shutdown_sb and ->free_sb hooks before and
after it.  To remove the amount of boilerplate code ->shutdown_sb is only
called if the super has finished initialization and ->d_root is set.

Signed-off-by: Christoph Hellwig 
---
 Documentation/filesystems/locking.rst |  4 
 Documentation/filesystems/vfs.rst | 12 
 fs/super.c|  9 +++--
 include/linux/fs.h|  2 ++
 4 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/locking.rst 
b/Documentation/filesystems/locking.rst
index 7be2900806c853..c33e2f03ed1f69 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -220,7 +220,9 @@ prototypes::
 
struct dentry *(*mount) (struct file_system_type *, int,
   const char *, void *);
+   void (*shutdown_sb) (struct super_block *);
void (*kill_sb) (struct super_block *);
+   void (*free_sb) (struct super_block *);
 
 locking rules:
 
@@ -228,7 +230,9 @@ locking rules:
 opsmay block
 ====
 mount  yes
+shutdown_sbyes
 kill_sbyes
+free_sbyes
 ====
 
 ->mount() returns ERR_PTR or the root dentry; its superblock should be locked
diff --git a/Documentation/filesystems/vfs.rst 
b/Documentation/filesystems/vfs.rst
index 99acc2e9867391..1a7c6926c31f34 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -119,7 +119,9 @@ members are defined:
const struct fs_parameter_spec *parameters;
struct dentry *(*mount) (struct file_system_type *, int,
const char *, void *);
+   void (*shutdown_sb) (struct super_block *);
void (*kill_sb) (struct super_block *);
+   void (*free_sb) (struct super_block *);
struct module *owner;
struct file_system_type * next;
struct hlist_head fs_supers;
@@ -155,10 +157,20 @@ members are defined:
the method to call when a new instance of this filesystem should
be mounted
 
+``shutdown_sb``
+   Cleanup after a super_block has reached a zero active count, and before
+   the VFS level cleanup happens.  Typical picks all fs-specific objects
+   (if any) that need destruction out of superblock and releases them.
+   Note: dentries and inodes are normally taken care of and do not need
+   specific handling unless they are pinned by kernel users.
+
 ``kill_sb``
the method to call when an instance of this filesystem should be
shut down
 
+``free_sb``
+   Free file system specific resources like sb->s_fs_info that are
+   still needed while inodes are freed during umount.
 
 ``owner``
for internal VFS use: you should initialize this to THIS_MODULE
diff --git a/fs/super.c b/fs/super.c
index 5c685b4944c2d6..8e173eccc8c113 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -480,10 +480,15 @@ void deactivate_locked_super(struct super_block *s)
 
unregister_shrinker(>s_shrink);
 
-   if (fs->kill_sb)
+   if (fs->kill_sb) {
fs->kill_sb(s);
-   else
+   } else {
+   if (fs->shutdown_sb)
+   fs->shutdown_sb(s);
generic_shutdown_super(s);
+   if (fs->free_sb)
+   fs->free_sb(s);
+   }
 
kill_super_notify(s);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 31b6b235b36efa..12fff7df3cc46b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2341,6 +2341,8 @@ struct file_system_type {
struct dentry *(*mount) (struct file_system_type *, int,
   const char *, void *);
void (*kill_sb) (struct super_block *);
+   void (*shutdown_sb)(struct super_block *sb);
+   void (*free_sb)(struct super_block *sb);
struct module *owner;
struct file_system_type * next;
struct hlist_head fs_supers;
-- 
2.39.2



[PATCH 05/19] fs: assign an anon dev_t in common code

2023-09-13 Thread Christoph Hellwig
All super_blocks need to have a valid dev_t, and except for block
based file systems that tends to be an anonymouns dev_t.  Instead of
leaving that work to the file systems, assign the anonymous dev_t in
the core sget_fc and sget routines unless the file systems already
assigned on in the set callback.  Note that this now makes the
set callback optional as a lot of file systems don't need it any more.

Signed-off-by: Christoph Hellwig 
---
 fs/9p/vfs_super.c  |  2 +-
 fs/afs/super.c | 12 +++-
 fs/btrfs/super.c   |  6 ++--
 fs/ceph/super.c|  7 +
 fs/ecryptfs/main.c |  2 +-
 fs/fuse/inode.c|  2 +-
 fs/fuse/virtio_fs.c|  2 +-
 fs/kernfs/mount.c  |  2 +-
 fs/nfs/super.c |  2 +-
 fs/orangefs/super.c|  2 +-
 fs/smb/client/cifsfs.c |  3 +-
 fs/super.c | 68 +-
 fs/ubifs/super.c   |  2 +-
 include/linux/fs.h |  2 --
 14 files changed, 58 insertions(+), 56 deletions(-)

diff --git a/fs/9p/vfs_super.c b/fs/9p/vfs_super.c
index 9e60eddf5179ed..e8b3641c98f886 100644
--- a/fs/9p/vfs_super.c
+++ b/fs/9p/vfs_super.c
@@ -40,7 +40,7 @@ static const struct super_operations v9fs_super_ops, 
v9fs_super_ops_dotl;
 static int v9fs_set_super(struct super_block *s, void *data)
 {
s->s_fs_info = data;
-   return set_anon_super(s, data);
+   return 0;
 }
 
 /**
diff --git a/fs/afs/super.c b/fs/afs/super.c
index 754b9828233497..84b135ad3496b1 100644
--- a/fs/afs/super.c
+++ b/fs/afs/super.c
@@ -435,11 +435,6 @@ static int afs_dynroot_test_super(struct super_block *sb, 
struct fs_context *fc)
as->dyn_root);
 }
 
-static int afs_set_super(struct super_block *sb, struct fs_context *fc)
-{
-   return set_anon_super(sb, NULL);
-}
-
 /*
  * fill in the superblock
  */
@@ -574,9 +569,10 @@ static int afs_get_tree(struct fs_context *fc)
fc->s_fs_info = as;
 
/* allocate a deviceless superblock */
-   sb = sget_fc(fc,
-as->dyn_root ? afs_dynroot_test_super : afs_test_super,
-afs_set_super);
+   if (as->dyn_root)
+   sb = sget_fc(fc, afs_dynroot_test_super, NULL);
+   else
+   sb = sget_fc(fc, afs_test_super, NULL);
if (IS_ERR(sb)) {
ret = PTR_ERR(sb);
goto error;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 01b86bd4eae8dc..063b9aa313c227 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1350,10 +1350,8 @@ static int btrfs_test_super(struct super_block *s, void 
*data)
 
 static int btrfs_set_super(struct super_block *s, void *data)
 {
-   int err = set_anon_super(s, data);
-   if (!err)
-   s->s_fs_info = data;
-   return err;
+   s->s_fs_info = data;
+   return 0;
 }
 
 /*
diff --git a/fs/ceph/super.c b/fs/ceph/super.c
index 7feef0b35b97b5..cbeaab8c21d8e6 100644
--- a/fs/ceph/super.c
+++ b/fs/ceph/super.c
@@ -1191,7 +1191,6 @@ static struct dentry *ceph_real_mount(struct 
ceph_fs_client *fsc,
 static int ceph_set_super(struct super_block *s, struct fs_context *fc)
 {
struct ceph_fs_client *fsc = s->s_fs_info;
-   int ret;
 
dout("set_super %p\n", s);
 
@@ -1211,11 +1210,7 @@ static int ceph_set_super(struct super_block *s, struct 
fs_context *fc)
s->s_flags |= SB_NODIRATIME | SB_NOATIME;
 
ceph_fscrypt_set_ops(s);
-
-   ret = set_anon_super_fc(s, fc);
-   if (ret != 0)
-   fsc->sb = NULL;
-   return ret;
+   return 0;
 }
 
 /*
diff --git a/fs/ecryptfs/main.c b/fs/ecryptfs/main.c
index d99b2311759166..3ed91537a3991a 100644
--- a/fs/ecryptfs/main.c
+++ b/fs/ecryptfs/main.c
@@ -505,7 +505,7 @@ static struct dentry *ecryptfs_mount(struct 
file_system_type *fs_type, int flags
}
mount_crypt_stat = >mount_crypt_stat;
 
-   s = sget(fs_type, NULL, set_anon_super, flags, NULL);
+   s = sget(fs_type, NULL, NULL, flags, NULL);
if (IS_ERR(s)) {
rc = PTR_ERR(s);
goto out;
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 42523edb32fd53..5731003b56a9c9 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1511,7 +1511,7 @@ static int fuse_get_tree_submount(struct fs_context *fsc)
 
fm->fc = fuse_conn_get(fc);
fsc->s_fs_info = fm;
-   sb = sget_fc(fsc, NULL, set_anon_super_fc);
+   sb = sget_fc(fsc, NULL, NULL);
if (fsc->s_fs_info)
fuse_mount_destroy(fm);
if (IS_ERR(sb))
diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index 0a0d593e5a9c79..a52957df956394 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -1454,7 +1454,7 @@ static int virtio_fs_get_tree(struct fs_context *fsc)
virtqueue_size - FUSE_HEADER_OVERHEAD);
 
fsc->s_fs_info = fm;
-   sb = sget_fc(fsc, virtio_fs_test_super, set_anon_super_fc);
+   sb = sg

[PATCH 06/19] qibfs: use simple_release_fs

2023-09-13 Thread Christoph Hellwig
qibfs currently has convoluted code to allow registering HCAs while qibfs
is not mounted and vice versa.  Switch to using simple_release_fs every
time an entry is added to pin the fs instance and remove all the boiler
plate code.

Signed-off-by: Christoph Hellwig 
---
 drivers/infiniband/hw/qib/qib.h  |   4 +-
 drivers/infiniband/hw/qib/qib_fs.c   | 105 ++-
 drivers/infiniband/hw/qib/qib_init.c |  32 +++-
 3 files changed, 36 insertions(+), 105 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib.h b/drivers/infiniband/hw/qib/qib.h
index 26c615772be390..f73c321d0bff88 100644
--- a/drivers/infiniband/hw/qib/qib.h
+++ b/drivers/infiniband/hw/qib/qib.h
@@ -1370,10 +1370,10 @@ void qib_device_remove(struct qib_devdata *);
 extern int qib_qsfp_dump(struct qib_pportdata *ppd, char *buf, int len);
 
 int __init qib_init_qibfs(void);
-int __exit qib_exit_qibfs(void);
+void __exit qib_exit_qibfs(void);
 
 int qibfs_add(struct qib_devdata *);
-int qibfs_remove(struct qib_devdata *);
+void qibfs_remove(struct qib_devdata *);
 
 int qib_pcie_init(struct pci_dev *, const struct pci_device_id *);
 int qib_pcie_ddinit(struct qib_devdata *, struct pci_dev *,
diff --git a/drivers/infiniband/hw/qib/qib_fs.c 
b/drivers/infiniband/hw/qib/qib_fs.c
index ed7d4b02f45a63..c52ca34b32e67d 100644
--- a/drivers/infiniband/hw/qib/qib_fs.c
+++ b/drivers/infiniband/hw/qib/qib_fs.c
@@ -43,7 +43,9 @@
 
 #define QIBFS_MAGIC 0x726a77
 
-static struct super_block *qib_super;
+static struct file_system_type qibfs_fs_type;
+static struct vfsmount *qib_mount;
+static int qib_mnt_count;
 
 #define private2dd(file) (file_inode(file)->i_private)
 
@@ -355,15 +357,19 @@ static const struct file_operations flash_ops = {
.llseek = default_llseek,
 };
 
-static int add_cntr_files(struct super_block *sb, struct qib_devdata *dd)
+int qibfs_add(struct qib_devdata *dd)
 {
struct dentry *dir, *tmp;
char unit[10];
int ret, i;
+   
+   ret = simple_pin_fs(_fs_type, _mount, _mnt_count);
+   if (ret)
+   return ret;
 
/* create the per-unit directory */
snprintf(unit, sizeof(unit), "%u", dd->unit);
-   ret = create_file(unit, S_IFDIR|S_IRUGO|S_IXUGO, sb->s_root, ,
+   ret = create_file(unit, S_IFDIR|S_IRUGO|S_IXUGO, qib_mount->mnt_root, 
,
  _dir_operations, dd);
if (ret) {
pr_err("create_file(%s) failed: %d\n", unit, ret);
@@ -422,65 +428,35 @@ static int add_cntr_files(struct super_block *sb, struct 
qib_devdata *dd)
pr_err("create_file(%s/flash) failed: %d\n",
unit, ret);
 bail:
+   simple_release_fs(_mount, _mnt_count);
return ret;
 }
 
-static int remove_device_files(struct super_block *sb,
-  struct qib_devdata *dd)
+void qibfs_remove(struct qib_devdata *dd)
 {
struct dentry *dir;
char unit[10];
 
snprintf(unit, sizeof(unit), "%u", dd->unit);
-   dir = lookup_one_len_unlocked(unit, sb->s_root, strlen(unit));
-
-   if (IS_ERR(dir)) {
-   pr_err("Lookup of %s failed\n", unit);
-   return PTR_ERR(dir);
-   }
-   simple_recursive_removal(dir, NULL);
-   return 0;
+   dir = lookup_one_len_unlocked(unit, qib_mount->mnt_root, strlen(unit));
+   if (!IS_ERR(dir))
+   simple_recursive_removal(dir, NULL);
+   simple_release_fs(_mount, _mnt_count);
 }
 
-/*
- * This fills everything in when the fs is mounted, to handle umount/mount
- * after device init.  The direct add_cntr_files() call handles adding
- * them from the init code, when the fs is already mounted.
- */
 static int qibfs_fill_super(struct super_block *sb, struct fs_context *fc)
 {
-   struct qib_devdata *dd;
-   unsigned long index;
-   int ret;
-
static const struct tree_descr files[] = {
[2] = {"driver_stats", _ops[0], S_IRUGO},
[3] = {"driver_stats_names", _ops[1], S_IRUGO},
{""},
};
-
-   ret = simple_fill_super(sb, QIBFS_MAGIC, files);
-   if (ret) {
-   pr_err("simple_fill_super failed: %d\n", ret);
-   goto bail;
-   }
-
-   xa_for_each(_dev_table, index, dd) {
-   ret = add_cntr_files(sb, dd);
-   if (ret)
-   goto bail;
-   }
-
-bail:
-   return ret;
+   return simple_fill_super(sb, QIBFS_MAGIC, files);
 }
 
 static int qibfs_get_tree(struct fs_context *fc)
 {
-   int ret = get_tree_single(fc, qibfs_fill_super);
-   if (ret == 0)
-   qib_super = fc->root->d_sb;
-   return ret;
+   return get_tree_single(fc, qibfs_fill_super);
 }
 
 static const struct fs_context_operations qibfs_context_ops = {
@@ -493,46 +469,11 @@ static int qi

[PATCH 17/19] NFS: move nfs_kill_super to fs_context.c

2023-09-13 Thread Christoph Hellwig
nfs_kill_super is only used in fs_context, so move it there.

Signed-off-by: Christoph Hellwig 
---
 fs/nfs/fs_context.c | 13 +
 fs/nfs/internal.h   |  1 -
 fs/nfs/super.c  | 16 
 fs/nfs/sysfs.h  |  2 ++
 4 files changed, 15 insertions(+), 17 deletions(-)

diff --git a/fs/nfs/fs_context.c b/fs/nfs/fs_context.c
index 853e8d609bb3bc..ee82e4cfb38bb5 100644
--- a/fs/nfs/fs_context.c
+++ b/fs/nfs/fs_context.c
@@ -21,8 +21,10 @@
 
 #include 
 
+#include "fscache.h"
 #include "nfs.h"
 #include "internal.h"
+#include "sysfs.h"
 
 #include "nfstrace.h"
 
@@ -1644,6 +1646,17 @@ static int nfs_init_fs_context(struct fs_context *fc)
return 0;
 }
 
+static void nfs_kill_super(struct super_block *s)
+{
+   struct nfs_server *server = NFS_SB(s);
+
+   nfs_sysfs_move_sb_to_server(server);
+   generic_shutdown_super(s);
+
+   nfs_fscache_release_super_cookie(s);
+   nfs_free_server(server);
+}
+
 struct file_system_type nfs_fs_type = {
.owner  = THIS_MODULE,
.name   = "nfs",
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 9c9cf764f6000d..49d5b03176c02d 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -447,7 +447,6 @@ extern const struct super_operations nfs_sops;
 bool nfs_auth_info_match(const struct nfs_auth_info *, rpc_authflavor_t);
 int nfs_try_get_tree(struct fs_context *);
 int nfs_get_tree_common(struct fs_context *);
-void nfs_kill_super(struct super_block *);
 
 extern struct rpc_stat nfs_rpcstat;
 
diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 89131e855e1393..5ba793e7f262d4 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -1329,22 +1329,6 @@ int nfs_get_tree_common(struct fs_context *fc)
goto out;
 }
 
-/*
- * Destroy an NFS superblock
- */
-void nfs_kill_super(struct super_block *s)
-{
-   struct nfs_server *server = NFS_SB(s);
-
-   nfs_sysfs_move_sb_to_server(server);
-   generic_shutdown_super(s);
-
-   nfs_fscache_release_super_cookie(s);
-
-   nfs_free_server(server);
-}
-EXPORT_SYMBOL_GPL(nfs_kill_super);
-
 #if IS_ENABLED(CONFIG_NFS_V4)
 
 /*
diff --git a/fs/nfs/sysfs.h b/fs/nfs/sysfs.h
index c5d1990cade50a..44c8a1712149c2 100644
--- a/fs/nfs/sysfs.h
+++ b/fs/nfs/sysfs.h
@@ -8,6 +8,8 @@
 
 #define CONTAINER_ID_MAXLEN (64)
 
+struct nfs_net;
+
 struct nfs_netns_client {
struct kobject kobject;
struct kobject nfs_net_kobj;
-- 
2.39.2



[PATCH 03/19] fs: release anon dev_t in deactivate_locked_super

2023-09-13 Thread Christoph Hellwig
Releasing an anon dev_t is a very common thing when freeing a
super_block, as that's done for basically any not block based file
system (modulo the odd mtd special case).  So instead of requiring
a special ->kill_sb helper and a lot of boilerplate in more complicated
file systems, just release the anon dev_t in deactivate_locked_super if
the super_block was using one.

As the freeing is done after the main call to kill_super_notify, this
removes the need for having two slightly different call sites for it.

Signed-off-by: Christoph Hellwig 
---
 block/bdev.c|  1 -
 drivers/dax/super.c |  1 -
 drivers/dma-buf/dma-buf.c   |  1 -
 drivers/gpu/drm/drm_drv.c   |  1 -
 drivers/misc/cxl/api.c  |  1 -
 drivers/scsi/cxlflash/ocxl_hw.c |  1 -
 fs/9p/vfs_super.c   |  2 +-
 fs/afs/super.c  |  2 +-
 fs/aio.c|  1 -
 fs/anon_inodes.c|  1 -
 fs/autofs/inode.c   |  4 ++--
 fs/btrfs/super.c|  3 ++-
 fs/btrfs/tests/btrfs-tests.c|  1 -
 fs/ceph/super.c |  2 +-
 fs/coda/inode.c |  1 -
 fs/ecryptfs/main.c  |  3 ++-
 fs/erofs/super.c|  4 ++--
 fs/fuse/inode.c |  2 +-
 fs/fuse/virtio_fs.c |  2 +-
 fs/hostfs/hostfs_kern.c |  2 +-
 fs/kernfs/mount.c   |  2 +-
 fs/nfs/super.c  |  2 +-
 fs/nsfs.c   |  1 -
 fs/openpromfs/inode.c   |  1 -
 fs/orangefs/super.c |  2 +-
 fs/overlayfs/super.c|  1 -
 fs/pipe.c   |  1 -
 fs/proc/root.c  |  4 ++--
 fs/smb/client/cifsfs.c  |  2 +-
 fs/super.c  | 22 --
 fs/ubifs/super.c|  3 ++-
 fs/vboxsf/super.c   |  1 -
 include/linux/fs.h  |  1 -
 kernel/resource.c   |  1 -
 mm/secretmem.c  |  1 -
 net/socket.c|  1 -
 security/apparmor/apparmorfs.c  |  1 -
 37 files changed, 30 insertions(+), 53 deletions(-)

diff --git a/block/bdev.c b/block/bdev.c
index f3b13aa1b7d428..9db691401497bb 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -358,7 +358,6 @@ static int bd_init_fs_context(struct fs_context *fc)
 static struct file_system_type bd_type = {
.name   = "bdev",
.init_fs_context = bd_init_fs_context,
-   .kill_sb= kill_anon_super,
 };
 
 struct super_block *blockdev_superblock __read_mostly;
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 0da9232ea1754b..a9315b7396e68a 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -397,7 +397,6 @@ static int dax_init_fs_context(struct fs_context *fc)
 static struct file_system_type dax_fs_type = {
.name   = "dax",
.init_fs_context = dax_init_fs_context,
-   .kill_sb= kill_anon_super,
 };
 
 static int dax_test(struct inode *inode, void *data)
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 21916bba77d58b..7313e99f6e8ea5 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -125,7 +125,6 @@ static int dma_buf_fs_init_context(struct fs_context *fc)
 static struct file_system_type dma_buf_fs_type = {
.name = "dmabuf",
.init_fs_context = dma_buf_fs_init_context,
-   .kill_sb = kill_anon_super,
 };
 
 static int dma_buf_mmap_internal(struct file *file, struct vm_area_struct *vma)
diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index 3eda026ffac6a9..83676229cbe233 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -530,7 +530,6 @@ static struct file_system_type drm_fs_type = {
.name   = "drm",
.owner  = THIS_MODULE,
.init_fs_context = drm_fs_init_fs_context,
-   .kill_sb= kill_anon_super,
 };
 
 static struct inode *drm_fs_inode_new(void)
diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
index d85c5653086357..05b40076a0b481 100644
--- a/drivers/misc/cxl/api.c
+++ b/drivers/misc/cxl/api.c
@@ -44,7 +44,6 @@ static struct file_system_type cxl_fs_type = {
.name   = "cxl",
.owner  = THIS_MODULE,
.init_fs_context = cxl_fs_init_fs_context,
-   .kill_sb= kill_anon_super,
 };
 
 
diff --git a/drivers/scsi/cxlflash/ocxl_hw.c b/drivers/scsi/cxlflash/ocxl_hw.c
index 6542818e595a64..20f22610b104df 100644
--- a/drivers/scsi/cxlflash/ocxl_hw.c
+++ b/drivers/scsi/cxlflash/ocxl_hw.c
@@ -43,7 +43,6 @@ static struct file_system_type ocxlflash_fs_type = {
.name   = "ocxlflash",
.owner  = THIS_MODULE,
.init_fs_context = ocxlflash_fs_init_fs_context,
-   .kill_sb= kill_anon_super,
 };
 
 /*
diff --git a/fs/9p/vfs_super.c b/fs/9p/vfs_super.c
index 73db55c050bf10..9e60eddf5179ed 100644
-

[PATCH 09/19] zonefs: remove duplicate cleanup in zonefs_fill_super

2023-09-13 Thread Christoph Hellwig
When ->fill_super fails, ->kill_sb is called which already cleans up
the inodes and zgroups.

Drop the extra cleanup code in zonefs_fill_super.

Signed-off-by: Christoph Hellwig 
---
 fs/zonefs/super.c | 21 +
 1 file changed, 5 insertions(+), 16 deletions(-)

diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
index 9d1a9808fbbba6..35b2554ce2ac2e 100644
--- a/fs/zonefs/super.c
+++ b/fs/zonefs/super.c
@@ -1309,13 +1309,12 @@ static int zonefs_fill_super(struct super_block *sb, 
void *data, int silent)
/* Initialize the zone groups */
ret = zonefs_init_zgroups(sb);
if (ret)
-   goto cleanup;
+   return ret;
 
/* Create the root directory inode */
-   ret = -ENOMEM;
inode = new_inode(sb);
if (!inode)
-   goto cleanup;
+   return -ENOMEM;
 
inode->i_ino = bdev_nr_zones(sb->s_bdev);
inode->i_mode = S_IFDIR | 0555;
@@ -1333,7 +1332,7 @@ static int zonefs_fill_super(struct super_block *sb, void 
*data, int silent)
 
sb->s_root = d_make_root(inode);
if (!sb->s_root)
-   goto cleanup;
+   return -ENOMEM;
 
/*
 * Take a reference on the zone groups directory inodes
@@ -1341,19 +1340,9 @@ static int zonefs_fill_super(struct super_block *sb, 
void *data, int silent)
 */
ret = zonefs_get_zgroup_inodes(sb);
if (ret)
-   goto cleanup;
-
-   ret = zonefs_sysfs_register(sb);
-   if (ret)
-   goto cleanup;
-
-   return 0;
-
-cleanup:
-   zonefs_release_zgroup_inodes(sb);
-   zonefs_free_zgroups(sb);
+   return ret;
 
-   return ret;
+   return zonefs_sysfs_register(sb);
 }
 
 static struct dentry *zonefs_mount(struct file_system_type *fs_type,
-- 
2.39.2



[PATCH 16/19] x86/resctrl: release rdtgroup_mutex and the CPU hotplug lock in rdt_shutdown_sb

2023-09-13 Thread Christoph Hellwig
While the resctl code is a bit confusing, I can't find anything protected
by rdtgroup_mutex or the CPU hotplug lock in generic_shutdown_super or
kernfs_free_sb.  Drop the locks at the end of rdt_shutdown_sb to avoid
holding locks over method calls and VFS code which itself already has a
rather complicated locking hierarchy.

Signed-off-by: Christoph Hellwig 
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 8db767fd80df6b..e87de519493021 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2793,11 +2793,6 @@ static void rdt_shutdown_sb(struct super_block *sb)
static_branch_disable_cpuslocked(_mon_enable_key);
static_branch_disable_cpuslocked(_enable_key);
kernfs_shutdown_sb(sb);
-}
-
-static void rdt_free_sb(struct super_block *sb)
-{
-   kernfs_free_sb(sb);
mutex_unlock(_mutex);
cpus_read_unlock();
 }
@@ -2807,7 +2802,7 @@ static struct file_system_type rdt_fs_type = {
.init_fs_context= rdt_init_fs_context,
.parameters = rdt_fs_parameters,
.shutdown_sb= rdt_shutdown_sb,
-   .free_sb= rdt_free_sb,
+   .free_sb= kernfs_free_sb,
 };
 
 static int mon_addfile(struct kernfs_node *parent_kn, const char *name,
-- 
2.39.2



[PATCH 02/19] fs: make ->kill_sb optional

2023-09-13 Thread Christoph Hellwig
Call generic_shutdown_super if ->kill_sb is not provided by the file
system.  This can't currently happen but will become common soon.

Signed-off-by: Christoph Hellwig 
---
 fs/super.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/super.c b/fs/super.c
index 127a17d958a482..ab234e6af48605 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -483,7 +483,11 @@ void deactivate_locked_super(struct super_block *s)
}
 
unregister_shrinker(>s_shrink);
-   fs->kill_sb(s);
+
+   if (fs->kill_sb)
+   fs->kill_sb(s);
+   else
+   generic_shutdown_super(s);
 
kill_super_notify(s);
 
-- 
2.39.2



[PATCH 10/19] USB: gadget/legacy: remove sb_mutex

2023-09-13 Thread Christoph Hellwig
Creating new a new super_block vs freeing the old one for single instance
file systems is serialized by the wait for SB_DEAD.

Remove the superfluous sb_mutex.

Signed-off-by: Christoph Hellwig 
---
 drivers/usb/gadget/legacy/inode.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/drivers/usb/gadget/legacy/inode.c 
b/drivers/usb/gadget/legacy/inode.c
index ce9e31f3d26bcc..a203266bc0dc82 100644
--- a/drivers/usb/gadget/legacy/inode.c
+++ b/drivers/usb/gadget/legacy/inode.c
@@ -229,7 +229,6 @@ static void put_ep (struct ep_data *data)
  */
 
 static const char *CHIP;
-static DEFINE_MUTEX(sb_mutex); /* Serialize superblock operations */
 
 /*--*/
 
@@ -2012,8 +2011,6 @@ gadgetfs_fill_super (struct super_block *sb, struct 
fs_context *fc)
struct dev_data *dev;
int rc;
 
-   mutex_lock(_mutex);
-
if (the_device) {
rc = -ESRCH;
goto Done;
@@ -2069,7 +2066,6 @@ gadgetfs_fill_super (struct super_block *sb, struct 
fs_context *fc)
rc = -ENOMEM;
 
  Done:
-   mutex_unlock(_mutex);
return rc;
 }
 
@@ -2092,7 +2088,6 @@ static int gadgetfs_init_fs_context(struct fs_context *fc)
 static void
 gadgetfs_kill_sb (struct super_block *sb)
 {
-   mutex_lock(_mutex);
kill_litter_super (sb);
if (the_device) {
put_dev (the_device);
@@ -2100,7 +2095,6 @@ gadgetfs_kill_sb (struct super_block *sb)
}
kfree(CHIP);
CHIP = NULL;
-   mutex_unlock(_mutex);
 }
 
 /*--*/
-- 
2.39.2



[PATCH 14/19] jffs2: convert to ->shutdown_sb and ->free_sb

2023-09-13 Thread Christoph Hellwig
Convert jffs2 from ->kill_sb to ->shutdown_sb and ->free_sb.  Drop
the otherwise unused kill_mtd_super helpers, as there is no benefit in
it over just calling put_mtd_device on sb->s_mtd.

Signed-off-by: Christoph Hellwig 
---
 drivers/mtd/mtdsuper.c| 12 
 fs/jffs2/super.c  | 22 ++
 include/linux/mtd/super.h |  2 --
 3 files changed, 14 insertions(+), 22 deletions(-)

diff --git a/drivers/mtd/mtdsuper.c b/drivers/mtd/mtdsuper.c
index b7e3763c47f0cd..66da2e6f90f5f5 100644
--- a/drivers/mtd/mtdsuper.c
+++ b/drivers/mtd/mtdsuper.c
@@ -165,15 +165,3 @@ int get_tree_mtd(struct fs_context *fc,
return -EINVAL;
 }
 EXPORT_SYMBOL_GPL(get_tree_mtd);
-
-/*
- * destroy an MTD-based superblock
- */
-void kill_mtd_super(struct super_block *sb)
-{
-   generic_shutdown_super(sb);
-   put_mtd_device(sb->s_mtd);
-   sb->s_mtd = NULL;
-}
-
-EXPORT_SYMBOL_GPL(kill_mtd_super);
diff --git a/fs/jffs2/super.c b/fs/jffs2/super.c
index 7ea37f49f1e18e..14577368202e90 100644
--- a/fs/jffs2/super.c
+++ b/fs/jffs2/super.c
@@ -340,21 +340,27 @@ static void jffs2_put_super (struct super_block *sb)
jffs2_dbg(1, "%s(): returning\n", __func__);
 }
 
-static void jffs2_kill_sb(struct super_block *sb)
+static void jffs2_shutdown_sb(struct super_block *sb)
 {
struct jffs2_sb_info *c = JFFS2_SB_INFO(sb);
+
if (c && !sb_rdonly(sb))
jffs2_stop_garbage_collect_thread(c);
-   kill_mtd_super(sb);
-   kfree(c);
+}
+
+static void jffs2_free_sb(struct super_block *sb)
+{
+   put_mtd_device(sb->s_mtd);
+   kfree(JFFS2_SB_INFO(sb));
 }
 
 static struct file_system_type jffs2_fs_type = {
-   .owner =THIS_MODULE,
-   .name = "jffs2",
-   .init_fs_context = jffs2_init_fs_context,
-   .parameters =   jffs2_fs_parameters,
-   .kill_sb =  jffs2_kill_sb,
+   .owner  = THIS_MODULE,
+   .name   = "jffs2",
+   .init_fs_context= jffs2_init_fs_context,
+   .parameters = jffs2_fs_parameters,
+   .shutdown_sb= jffs2_shutdown_sb,
+   .free_sb= jffs2_free_sb,
 };
 MODULE_ALIAS_FS("jffs2");
 
diff --git a/include/linux/mtd/super.h b/include/linux/mtd/super.h
index 3608a6c36faceb..f6d5c1a17eec23 100644
--- a/include/linux/mtd/super.h
+++ b/include/linux/mtd/super.h
@@ -17,8 +17,6 @@
 extern int get_tree_mtd(struct fs_context *fc,
 int (*fill_super)(struct super_block *sb,
   struct fs_context *fc));
-extern void kill_mtd_super(struct super_block *sb);
-
 
 #endif /* __KERNEL__ */
 
-- 
2.39.2



[PATCH 15/19] kernfs: split ->kill_sb

2023-09-13 Thread Christoph Hellwig
Split the kernfs_kill_sb helper into helpers for the new split
shutdown_sb and free_sb methods.  Note that resctrl has very odd
locking in ->kill_sb, so this commit only releases the locking
acquired in rdt_shutdown_sb in rdt_free_sb.  This is not very good
code and relies on ->shutdown_sb and ->free_sb to always be called
in pairs, which it currently is.  The next commit will try to clean
this up.

Signed-off-by: Christoph Hellwig 
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 12 +---
 fs/kernfs/mount.c  | 18 --
 fs/sysfs/mount.c   |  7 ---
 include/linux/kernfs.h |  5 ++---
 kernel/cgroup/cgroup.c | 10 ++
 5 files changed, 29 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 725344048f85da..8db767fd80df6b 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2772,7 +2772,7 @@ static void rmdir_all_sub(void)
kernfs_remove(kn_mondata);
 }
 
-static void rdt_kill_sb(struct super_block *sb)
+static void rdt_shutdown_sb(struct super_block *sb)
 {
struct rdt_resource *r;
 
@@ -2792,7 +2792,12 @@ static void rdt_kill_sb(struct super_block *sb)
static_branch_disable_cpuslocked(_alloc_enable_key);
static_branch_disable_cpuslocked(_mon_enable_key);
static_branch_disable_cpuslocked(_enable_key);
-   kernfs_kill_sb(sb);
+   kernfs_shutdown_sb(sb);
+}
+
+static void rdt_free_sb(struct super_block *sb)
+{
+   kernfs_free_sb(sb);
mutex_unlock(_mutex);
cpus_read_unlock();
 }
@@ -2801,7 +2806,8 @@ static struct file_system_type rdt_fs_type = {
.name   = "resctrl",
.init_fs_context= rdt_init_fs_context,
.parameters = rdt_fs_parameters,
-   .kill_sb= rdt_kill_sb,
+   .shutdown_sb= rdt_shutdown_sb,
+   .free_sb= rdt_free_sb,
 };
 
 static int mon_addfile(struct kernfs_node *parent_kn, const char *name,
diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
index d6d3cba669dbdd..32ec4ec3c878f6 100644
--- a/fs/kernfs/mount.c
+++ b/fs/kernfs/mount.c
@@ -379,14 +379,14 @@ void kernfs_free_fs_context(struct fs_context *fc)
 }
 
 /**
- * kernfs_kill_sb - kill_sb for kernfs
+ * kernfs_shutdown_sb - shutdown_sb for kernfs
  * @sb: super_block being killed
  *
- * This can be used directly for file_system_type->kill_sb().  If a kernfs
- * user needs extra cleanup, it can implement its own kill_sb() and call
+ * This can be used directly for file_system_type->shutdown_sb().  If a kernfs
+ * user needs extra cleanup, it can implement its own shutdown_sb() and call
  * this function at the end.
  */
-void kernfs_kill_sb(struct super_block *sb)
+void kernfs_shutdown_sb(struct super_block *sb)
 {
struct kernfs_super_info *info = kernfs_info(sb);
struct kernfs_root *root = info->root;
@@ -394,13 +394,11 @@ void kernfs_kill_sb(struct super_block *sb)
down_write(>kernfs_supers_rwsem);
list_del(>node);
up_write(>kernfs_supers_rwsem);
+}
 
-   /*
-* Remove the superblock from fs_supers/s_instances
-* so we can't find it, before freeing kernfs_super_info.
-*/
-   generic_shutdown_super(sb);
-   kfree(info);
+void kernfs_free_sb(struct super_block *sb)
+{
+   kfree(kernfs_info(sb));
 }
 
 static void __init kernfs_mutex_init(void)
diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
index 98467bb7673781..804391342599bc 100644
--- a/fs/sysfs/mount.c
+++ b/fs/sysfs/mount.c
@@ -79,18 +79,19 @@ static int sysfs_init_fs_context(struct fs_context *fc)
return 0;
 }
 
-static void sysfs_kill_sb(struct super_block *sb)
+static void sysfs_free_sb(struct super_block *sb)
 {
void *ns = (void *)kernfs_super_ns(sb);
 
-   kernfs_kill_sb(sb);
+   kernfs_free_sb(sb);
kobj_ns_drop(KOBJ_NS_TYPE_NET, ns);
 }
 
 static struct file_system_type sysfs_fs_type = {
.name   = "sysfs",
.init_fs_context= sysfs_init_fs_context,
-   .kill_sb= sysfs_kill_sb,
+   .shutdown_sb= kernfs_shutdown_sb,
+   .free_sb= sysfs_free_sb,
.fs_flags   = FS_USERNS_MOUNT,
 };
 
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 2a36f3218b5106..940059251deac8 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -453,7 +453,8 @@ int kernfs_xattr_set(struct kernfs_node *kn, const char 
*name,
 const void *kernfs_super_ns(struct super_block *sb);
 int kernfs_get_tree(struct fs_context *fc);
 void kernfs_free_fs_context(struct fs_context *fc);
-void kernfs_kill_sb(struct super_block *sb);
+void kernfs_shutdown_sb(struct super_block *sb);
+void kernfs_free_sb(struct super_block *sb);
 
 void 

[PATCH 12/19] fs: convert kill_litter_super to litter_shutdown_sb

2023-09-13 Thread Christoph Hellwig
Replace kill_litter_super with litter_shutdown_sb, which is wired up to
the ->shutdown_sb method.  For file systems that wrapped
kill_litter_super, ->kill_sb is replaced with ->shutdown and ->free_sb
methods as needed.

Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/platforms/cell/spufs/inode.c | 10 +-
 arch/s390/hypfs/inode.c   |  6 +++---
 drivers/android/binderfs.c| 12 +++-
 drivers/base/devtmpfs.c   |  8 
 drivers/infiniband/hw/qib/qib_fs.c|  2 +-
 drivers/misc/ibmasm/ibmasmfs.c|  8 
 drivers/usb/gadget/function/f_fs.c|  6 +++---
 drivers/usb/gadget/legacy/inode.c | 12 ++--
 drivers/xen/xenfs/super.c |  8 
 fs/binfmt_misc.c  |  8 
 fs/configfs/mount.c   |  8 
 fs/debugfs/inode.c|  8 
 fs/devpts/inode.c |  6 +++---
 fs/efivarfs/super.c   | 13 ++---
 fs/fuse/control.c | 12 ++--
 fs/hugetlbfs/inode.c  |  2 +-
 fs/nfsd/nfsctl.c  | 22 --
 fs/ocfs2/dlmfs/dlmfs.c|  2 +-
 fs/pstore/inode.c |  7 +++
 fs/ramfs/inode.c  |  6 +++---
 fs/super.c| 14 +++---
 fs/tracefs/inode.c|  2 +-
 include/linux/fs.h|  2 +-
 include/linux/ramfs.h |  2 +-
 init/do_mounts.c  |  6 +++---
 ipc/mqueue.c  |  2 +-
 kernel/bpf/inode.c|  2 +-
 mm/shmem.c|  5 +++--
 net/sunrpc/rpc_pipe.c | 19 ---
 security/inode.c  |  8 
 security/selinux/selinuxfs.c  | 15 +--
 security/smack/smackfs.c  |  6 +++---
 32 files changed, 126 insertions(+), 123 deletions(-)

diff --git a/arch/powerpc/platforms/cell/spufs/inode.c 
b/arch/powerpc/platforms/cell/spufs/inode.c
index 38c5be34c8951f..2610a0731ea242 100644
--- a/arch/powerpc/platforms/cell/spufs/inode.c
+++ b/arch/powerpc/platforms/cell/spufs/inode.c
@@ -763,11 +763,11 @@ static int spufs_init_fs_context(struct fs_context *fc)
 }
 
 static struct file_system_type spufs_type = {
-   .owner = THIS_MODULE,
-   .name = "spufs",
-   .init_fs_context = spufs_init_fs_context,
-   .parameters = spufs_fs_parameters,
-   .kill_sb = kill_litter_super,
+   .owner  = THIS_MODULE,
+   .name   = "spufs",
+   .init_fs_context= spufs_init_fs_context,
+   .parameters = spufs_fs_parameters,
+   .shutdown_sb= litter_shutdown_sb,
 };
 MODULE_ALIAS_FS("spufs");
 
diff --git a/arch/s390/hypfs/inode.c b/arch/s390/hypfs/inode.c
index 3261fb9cade648..f18e3b844c5d9b 100644
--- a/arch/s390/hypfs/inode.c
+++ b/arch/s390/hypfs/inode.c
@@ -294,9 +294,8 @@ static int hypfs_init_fs_context(struct fs_context *fc)
return 0;
 }
 
-static void hypfs_kill_super(struct super_block *sb)
+static void hypfs_free_sb(struct super_block *sb)
 {
-   kill_litter_super(sb);
kfree(sb->s_fs_info);
 }
 
@@ -417,7 +416,8 @@ static struct file_system_type hypfs_type = {
.name   = "s390_hypfs",
.init_fs_context = hypfs_init_fs_context,
.parameters = hypfs_fs_parameters,
-   .kill_sb= hypfs_kill_super
+   .shutdown_sb= litter_shutdown_sb,
+   .free_sb= hypfs_free_sb,
 };
 
 static const struct super_operations hypfs_s_ops = {
diff --git a/drivers/android/binderfs.c b/drivers/android/binderfs.c
index 81effec17b3d63..f48196391239c0 100644
--- a/drivers/android/binderfs.c
+++ b/drivers/android/binderfs.c
@@ -771,19 +771,12 @@ static int binderfs_init_fs_context(struct fs_context *fc)
return 0;
 }
 
-static void binderfs_kill_super(struct super_block *sb)
+static void binderfs_free_sb(struct super_block *sb)
 {
struct binderfs_info *info = sb->s_fs_info;
 
-   /*
-* During inode eviction struct binderfs_info is needed.
-* So first wipe the super_block then free struct binderfs_info.
-*/
-   kill_litter_super(sb);
-
if (info && info->ipc_ns)
put_ipc_ns(info->ipc_ns);
-
kfree(info);
 }
 
@@ -791,7 +784,8 @@ static struct file_system_type binder_fs_type = {
.name   = "binder",
.init_fs_context= binderfs_init_fs_context,
.parameters = binderfs_fs_parameters,
-   .kill_sb= binderfs_kill_super,
+   .shutdown_sb= litter_

[PATCH 19/19] fs: remove ->kill_sb

2023-09-13 Thread Christoph Hellwig
Now that no instances are left, remove ->kill_sb and mark
generic_shutdown_super static.

Signed-off-by: Christoph Hellwig 
---
 Documentation/filesystems/locking.rst |  5 -
 Documentation/filesystems/vfs.rst |  5 -
 fs/super.c| 25 +
 include/linux/fs.h|  2 --
 4 files changed, 9 insertions(+), 28 deletions(-)

diff --git a/Documentation/filesystems/locking.rst 
b/Documentation/filesystems/locking.rst
index c33e2f03ed1f69..e4ca99c0828d00 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -221,7 +221,6 @@ prototypes::
struct dentry *(*mount) (struct file_system_type *, int,
   const char *, void *);
void (*shutdown_sb) (struct super_block *);
-   void (*kill_sb) (struct super_block *);
void (*free_sb) (struct super_block *);
 
 locking rules:
@@ -231,16 +230,12 @@ ops   may block
 ====
 mount  yes
 shutdown_sbyes
-kill_sbyes
 free_sbyes
 ====
 
 ->mount() returns ERR_PTR or the root dentry; its superblock should be locked
 on return.
 
-->kill_sb() takes a write-locked superblock, does all shutdown work on it,
-unlocks and drops the reference.
-
 address_space_operations
 
 prototypes::
diff --git a/Documentation/filesystems/vfs.rst 
b/Documentation/filesystems/vfs.rst
index 1a7c6926c31f34..29513ee1d34ede 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -120,7 +120,6 @@ members are defined:
struct dentry *(*mount) (struct file_system_type *, int,
const char *, void *);
void (*shutdown_sb) (struct super_block *);
-   void (*kill_sb) (struct super_block *);
void (*free_sb) (struct super_block *);
struct module *owner;
struct file_system_type * next;
@@ -164,10 +163,6 @@ members are defined:
Note: dentries and inodes are normally taken care of and do not need
specific handling unless they are pinned by kernel users.
 
-``kill_sb``
-   the method to call when an instance of this filesystem should be
-   shut down
-
 ``free_sb``
Free file system specific resources like sb->s_fs_info that are
still needed while inodes are freed during umount.
diff --git a/fs/super.c b/fs/super.c
index 805ca1dd1e23f2..d9c564e70ffcd5 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -458,6 +458,8 @@ static void kill_super_notify(struct super_block *sb)
super_wake(sb, SB_DEAD);
 }
 
+static void generic_shutdown_super(struct super_block *sb);
+
 /**
  * deactivate_locked_super -   drop an active reference to superblock
  * @s: superblock to deactivate
@@ -480,15 +482,11 @@ void deactivate_locked_super(struct super_block *s)
 
unregister_shrinker(>s_shrink);
 
-   if (fs->kill_sb) {
-   fs->kill_sb(s);
-   } else {
-   if (fs->shutdown_sb)
-   fs->shutdown_sb(s);
-   generic_shutdown_super(s);
-   if (fs->free_sb)
-   fs->free_sb(s);
-   }
+   if (fs->shutdown_sb)
+   fs->shutdown_sb(s);
+   generic_shutdown_super(s);
+   if (fs->free_sb)
+   fs->free_sb(s);
 
kill_super_notify(s);
 
@@ -661,16 +659,13 @@ EXPORT_SYMBOL(retire_super);
  * @sb: superblock to kill
  *
  * generic_shutdown_super() does all fs-independent work on superblock
- * shutdown.  Typical ->kill_sb() should pick all fs-specific objects
- * that need destruction out of superblock, call generic_shutdown_super()
- * and release aforementioned objects.  Note: dentries and inodes _are_
- * taken care of and do not need specific handling.
+ * shutdown. 
  *
  * Upon calling this function, the filesystem may no longer alter or
  * rearrange the set of dentries belonging to this super_block, nor may it
  * change the attachments of dentries to inodes.
  */
-void generic_shutdown_super(struct super_block *sb)
+static void generic_shutdown_super(struct super_block *sb)
 {
const struct super_operations *sop = sb->s_op;
 
@@ -743,8 +738,6 @@ void generic_shutdown_super(struct super_block *sb)
}
 }
 
-EXPORT_SYMBOL(generic_shutdown_super);
-
 bool mount_capable(struct fs_context *fc)
 {
if (!(fc->fs_type->fs_flags & FS_USERNS_MOUNT))
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 302be5dfc1a04a..f57d3a27b488f7 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2340,7 +2340,6 @@ struct file_system_type {
const struct fs_parameter_spec *parameters;
struct dentry *(*mount) (struct file_system_type *, int,
   const char *, void *);

[PATCH 08/19] pstore: shrink the pstore_sb_lock critical section in pstore_kill_sb

2023-09-13 Thread Christoph Hellwig
->kill_sb can't race with creating ->fill_super because pstore is a
_single file system that only ever has a single sb instance, and we wait
for the previous one to go away before creating a new one.  Reduce
the critical section so that is is not held over generic_shutdown_super.

Signed-off-by: Christoph Hellwig 
---
 fs/pstore/inode.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c
index 585360706b335f..fd1d24b47160d0 100644
--- a/fs/pstore/inode.c
+++ b/fs/pstore/inode.c
@@ -467,10 +467,9 @@ static struct dentry *pstore_mount(struct file_system_type 
*fs_type,
 
 static void pstore_kill_sb(struct super_block *sb)
 {
-   mutex_lock(_sb_lock);
-   WARN_ON(pstore_sb && pstore_sb != sb);
-
kill_litter_super(sb);
+
+   mutex_lock(_sb_lock);
pstore_sb = NULL;
 
mutex_lock(_list_lock);
-- 
2.39.2



[PATCH 13/19] fs: convert kill_block_super to block_free_sb

2023-09-13 Thread Christoph Hellwig
Replace kill_block_super with block_free_sb, which is wired up to
the ->free_sb method.  For file systems that wrapped kill_block_super,
->kill_sb is replaced with ->shutdown and ->free_sb methods as needed.

Signed-off-by: Christoph Hellwig 
---
 fs/adfs/super.c  |  2 +-
 fs/affs/super.c  |  7 ---
 fs/befs/linuxvfs.c   |  2 +-
 fs/bfs/inode.c   |  2 +-
 fs/efs/super.c   |  7 ---
 fs/erofs/super.c | 25 ++---
 fs/exfat/super.c |  6 +++---
 fs/ext2/super.c  |  2 +-
 fs/ext4/super.c  | 12 ++--
 fs/f2fs/super.c  |  6 +++---
 fs/fat/namei_msdos.c |  2 +-
 fs/fat/namei_vfat.c  |  2 +-
 fs/freevxfs/vxfs_super.c |  2 +-
 fs/fuse/inode.c  | 12 ++--
 fs/gfs2/ops_fstype.c | 11 ---
 fs/hfs/super.c   |  2 +-
 fs/hfsplus/super.c   |  2 +-
 fs/hpfs/super.c  |  2 +-
 fs/isofs/inode.c |  2 +-
 fs/jfs/super.c   |  2 +-
 fs/minix/inode.c |  2 +-
 fs/nilfs2/super.c|  2 +-
 fs/ntfs/super.c  |  2 +-
 fs/ntfs3/super.c |  6 +++---
 fs/ocfs2/super.c |  2 +-
 fs/omfs/inode.c  |  2 +-
 fs/qnx4/inode.c  |  7 ---
 fs/qnx6/inode.c  |  2 +-
 fs/reiserfs/super.c  |  7 +++
 fs/squashfs/super.c  |  2 +-
 fs/super.c   |  6 ++
 fs/sysv/super.c  |  4 ++--
 fs/udf/super.c   |  2 +-
 fs/ufs/super.c   |  2 +-
 fs/xfs/xfs_buf.c |  2 +-
 fs/xfs/xfs_super.c   |  6 +++---
 fs/zonefs/super.c| 13 ++---
 include/linux/fs.h   |  2 +-
 38 files changed, 86 insertions(+), 95 deletions(-)

diff --git a/fs/adfs/super.c b/fs/adfs/super.c
index e8bfc38239cd59..22f0137f485e5f 100644
--- a/fs/adfs/super.c
+++ b/fs/adfs/super.c
@@ -463,7 +463,7 @@ static struct file_system_type adfs_fs_type = {
.owner  = THIS_MODULE,
.name   = "adfs",
.mount  = adfs_mount,
-   .kill_sb= kill_block_super,
+   .free_sb= block_free_sb,
.fs_flags   = FS_REQUIRES_DEV,
 };
 MODULE_ALIAS_FS("adfs");
diff --git a/fs/affs/super.c b/fs/affs/super.c
index 58b391446ae1fd..775e878797f9fc 100644
--- a/fs/affs/super.c
+++ b/fs/affs/super.c
@@ -631,10 +631,11 @@ static struct dentry *affs_mount(struct file_system_type 
*fs_type,
return mount_bdev(fs_type, flags, dev_name, data, affs_fill_super);
 }
 
-static void affs_kill_sb(struct super_block *sb)
+static void affs_free_sb(struct super_block *sb)
 {
struct affs_sb_info *sbi = AFFS_SB(sb);
-   kill_block_super(sb);
+
+   block_free_sb(sb);
if (sbi) {
affs_free_bitmap(sb);
affs_brelse(sbi->s_root_bh);
@@ -648,7 +649,7 @@ static struct file_system_type affs_fs_type = {
.owner  = THIS_MODULE,
.name   = "affs",
.mount  = affs_mount,
-   .kill_sb= affs_kill_sb,
+   .free_sb= affs_free_sb,
.fs_flags   = FS_REQUIRES_DEV,
 };
 MODULE_ALIAS_FS("affs");
diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
index 9a16a51fbb88d4..7682c027d44782 100644
--- a/fs/befs/linuxvfs.c
+++ b/fs/befs/linuxvfs.c
@@ -981,7 +981,7 @@ static struct file_system_type befs_fs_type = {
.owner  = THIS_MODULE,
.name   = "befs",
.mount  = befs_mount,
-   .kill_sb= kill_block_super,
+   .free_sb= block_free_sb,
.fs_flags   = FS_REQUIRES_DEV,
 };
 MODULE_ALIAS_FS("befs");
diff --git a/fs/bfs/inode.c b/fs/bfs/inode.c
index e6a76ae9eb4442..4d894d5dd07074 100644
--- a/fs/bfs/inode.c
+++ b/fs/bfs/inode.c
@@ -458,7 +458,7 @@ static struct file_system_type bfs_fs_type = {
.owner  = THIS_MODULE,
.name   = "bfs",
.mount  = bfs_mount,
-   .kill_sb= kill_block_super,
+   .free_sb= block_free_sb,
.fs_flags   = FS_REQUIRES_DEV,
 };
 MODULE_ALIAS_FS("bfs");
diff --git a/fs/efs/super.c b/fs/efs/super.c
index b287f47c165ba8..1f808a455e7e87 100644
--- a/fs/efs/super.c
+++ b/fs/efs/super.c
@@ -28,10 +28,11 @@ static struct dentry *efs_mount(struct file_system_type 
*fs_type,
return mount_bdev(fs_type, flags, dev_name, data, efs_fill_super);
 }
 
-static void efs_kill_sb(struct super_block *s)
+static void efs_free_sb(struct super_block *s)
 {
struct efs_sb_info *sbi = SUPER_INFO(s);
-   kill_block_super(s);
+
+   block_free_sb(s);
kfree(sbi);
 }
 
@@ -39,7 +40,7 @@ static struct file_system_type efs_fs_type = {
.owner  = THIS_MODULE,
.name   = "efs",
.mount  = efs_mount,
-   .kill_sb= efs_kill_sb,
+   .free_sb= efs_free_sb,
.fs_flags   = FS_REQUIRE

[PATCH 07/19] hypfs: use d_genocide to kill fs entries

2023-09-13 Thread Christoph Hellwig
hypfs is entirely synthetic and doesn't care about i_nlink when dropping
entries from the cache.  Switch to d_genocide instead of a home grown
file remove loop for unmount and write (yes, really!).

Signed-off-by: Christoph Hellwig 
---
 arch/s390/hypfs/inode.c | 37 ++---
 1 file changed, 2 insertions(+), 35 deletions(-)

diff --git a/arch/s390/hypfs/inode.c b/arch/s390/hypfs/inode.c
index dbe8a7dcafa922..3261fb9cade648 100644
--- a/arch/s390/hypfs/inode.c
+++ b/arch/s390/hypfs/inode.c
@@ -64,33 +64,6 @@ static void hypfs_add_dentry(struct dentry *dentry)
hypfs_last_dentry = dentry;
 }
 
-static void hypfs_remove(struct dentry *dentry)
-{
-   struct dentry *parent;
-
-   parent = dentry->d_parent;
-   inode_lock(d_inode(parent));
-   if (simple_positive(dentry)) {
-   if (d_is_dir(dentry))
-   simple_rmdir(d_inode(parent), dentry);
-   else
-   simple_unlink(d_inode(parent), dentry);
-   }
-   d_drop(dentry);
-   dput(dentry);
-   inode_unlock(d_inode(parent));
-}
-
-static void hypfs_delete_tree(struct dentry *root)
-{
-   while (hypfs_last_dentry) {
-   struct dentry *next_dentry;
-   next_dentry = hypfs_last_dentry->d_fsdata;
-   hypfs_remove(hypfs_last_dentry);
-   hypfs_last_dentry = next_dentry;
-   }
-}
-
 static struct inode *hypfs_make_inode(struct super_block *sb, umode_t mode)
 {
struct inode *ret = new_inode(sb);
@@ -183,14 +156,14 @@ static ssize_t hypfs_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
rc = -EBUSY;
goto out;
}
-   hypfs_delete_tree(sb->s_root);
+   d_genocide(sb->s_root);
if (MACHINE_IS_VM)
rc = hypfs_vm_create_files(sb->s_root);
else
rc = hypfs_diag_create_files(sb->s_root);
if (rc) {
pr_err("Updating the hypfs tree failed\n");
-   hypfs_delete_tree(sb->s_root);
+   d_genocide(sb->s_root);
goto out;
}
hypfs_update_update(sb);
@@ -323,12 +296,6 @@ static int hypfs_init_fs_context(struct fs_context *fc)
 
 static void hypfs_kill_super(struct super_block *sb)
 {
-   struct hypfs_sb_info *sb_info = sb->s_fs_info;
-
-   if (sb->s_root)
-   hypfs_delete_tree(sb->s_root);
-   if (sb_info && sb_info->update_file)
-   hypfs_remove(sb_info->update_file);
kill_litter_super(sb);
kfree(sb->s_fs_info);
 }
-- 
2.39.2



[PATCH 04/19] NFS: remove the s_dev field from struct nfs_server

2023-09-13 Thread Christoph Hellwig
Don't duplicate the dev_t in the nfs_server structure given that it can
be trivially retrieved from the super_block.

Signed-off-by: Christoph Hellwig 
---
 fs/nfs/client.c   |  2 +-
 fs/nfs/nfs4proc.c |  8 
 fs/nfs/nfs4trace.h|  6 +++---
 fs/nfs/nfs4xdr.c  |  2 +-
 fs/nfs/super.c| 10 +++---
 include/linux/nfs_fs_sb.h |  1 -
 6 files changed, 12 insertions(+), 17 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 44eca51b28085d..039fd67ac17c82 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -1337,7 +1337,7 @@ static int nfs_volume_list_show(struct seq_file *m, void 
*v)
clp = server->nfs_client;
 
snprintf(dev, sizeof(dev), "%u:%u",
-MAJOR(server->s_dev), MINOR(server->s_dev));
+MAJOR(server->super->s_dev), MINOR(server->super->s_dev));
 
snprintf(fsid, sizeof(fsid), "%llx:%llx",
 (unsigned long long) server->fsid.major,
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 794343790ea8bb..4d002cc514983c 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -6763,7 +6763,7 @@ static int _nfs4_proc_getlk(struct nfs4_state *state, int 
cmd, struct file_lock
goto out;
lsp = request->fl_u.nfs4_fl.owner;
arg.lock_owner.id = lsp->ls_seqid.owner_id;
-   arg.lock_owner.s_dev = server->s_dev;
+   arg.lock_owner.s_dev = server->super->s_dev;
status = nfs4_call_sync(server->client, server, , _args, 
_res, 1);
switch (status) {
case 0:
@@ -7088,7 +7088,7 @@ static struct nfs4_lockdata *nfs4_alloc_lockdata(struct 
file_lock *fl,
goto out_free_seqid;
p->arg.lock_owner.clientid = server->nfs_client->cl_clientid;
p->arg.lock_owner.id = lsp->ls_seqid.owner_id;
-   p->arg.lock_owner.s_dev = server->s_dev;
+   p->arg.lock_owner.s_dev = server->super->s_dev;
p->res.lock_seqid = p->arg.lock_seqid;
p->lsp = lsp;
p->server = server;
@@ -7475,7 +7475,7 @@ nfs4_retry_setlk(struct nfs4_state *state, int cmd, 
struct file_lock *request)
.inode = state->inode,
.owner = { .clientid = clp->cl_clientid,
   .id = lsp->ls_seqid.owner_id,
-  .s_dev = server->s_dev },
+  .s_dev = server->super->s_dev },
};
int status;
 
@@ -7689,7 +7689,7 @@ nfs4_release_lockowner(struct nfs_server *server, struct 
nfs4_lock_state *lsp)
data->server = server;
data->args.lock_owner.clientid = server->nfs_client->cl_clientid;
data->args.lock_owner.id = lsp->ls_seqid.owner_id;
-   data->args.lock_owner.s_dev = server->s_dev;
+   data->args.lock_owner.s_dev = server->super->s_dev;
 
msg.rpc_argp = >args;
msg.rpc_resp = >res;
diff --git a/fs/nfs/nfs4trace.h b/fs/nfs/nfs4trace.h
index d27919d7241d38..13a602c675ddb2 100644
--- a/fs/nfs/nfs4trace.h
+++ b/fs/nfs/nfs4trace.h
@@ -945,7 +945,7 @@ TRACE_EVENT(nfs4_delegreturn_exit,
),
 
TP_fast_assign(
-   __entry->dev = res->server->s_dev;
+   __entry->dev = res->server->super->s_dev;
__entry->fhandle = nfs_fhandle_hash(args->fhandle);
__entry->error = error < 0 ? -error : 0;
__entry->stateid_seq =
@@ -1269,7 +1269,7 @@ DECLARE_EVENT_CLASS(nfs4_getattr_event,
),
 
TP_fast_assign(
-   __entry->dev = server->s_dev;
+   __entry->dev = server->super->s_dev;
__entry->valid = fattr->valid;
__entry->fhandle = nfs_fhandle_hash(fhandle);
__entry->fileid = (fattr->valid & 
NFS_ATTR_FATTR_FILEID) ? fattr->fileid : 0;
@@ -1966,7 +1966,7 @@ DECLARE_EVENT_CLASS(nfs4_deviceid_status,
),
 
TP_fast_assign(
-   __entry->dev = server->s_dev;
+   __entry->dev = server->super->s_dev;
__entry->status = status;
__assign_str(dstaddr, server->nfs_client->cl_hostname);
memcpy(__entry->deviceid, deviceid->data,
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index deec76cf5afeaf..9767c5e2ed1a9a 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -1420,7 +1420,7 @@ static inline void encode_openhdr(struct xdr_stream *xdr, 
const struct nfs_opena
p = xdr_encode_hyper(p, arg->clientid);
*p++ = cpu_to_be32(24);
p = xdr_encode_opaque_fixed(p, "open id:", 8);
-   *p++ = 

[PATCH 01/19] fs: reflow deactivate_locked_super

2023-09-13 Thread Christoph Hellwig
Return early for the case where the super block isn't cleaned up to
reduce level of indentation.

Signed-off-by: Christoph Hellwig 
---
 fs/super.c | 35 ++-
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 2d762ce67f6e6c..127a17d958a482 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -476,27 +476,28 @@ static void kill_super_notify(struct super_block *sb)
 void deactivate_locked_super(struct super_block *s)
 {
struct file_system_type *fs = s->s_type;
-   if (atomic_dec_and_test(>s_active)) {
-   unregister_shrinker(>s_shrink);
-   fs->kill_sb(s);
 
-   kill_super_notify(s);
-
-   /*
-* Since list_lru_destroy() may sleep, we cannot call it from
-* put_super(), where we hold the sb_lock. Therefore we destroy
-* the lru lists right now.
-*/
-   list_lru_destroy(>s_dentry_lru);
-   list_lru_destroy(>s_inode_lru);
-
-   put_filesystem(fs);
-   put_super(s);
-   } else {
+   if (!atomic_dec_and_test(>s_active)) {
super_unlock_excl(s);
+   return;
}
-}
 
+   unregister_shrinker(>s_shrink);
+   fs->kill_sb(s);
+
+   kill_super_notify(s);
+
+   /*
+* Since list_lru_destroy() may sleep, we cannot call it from
+* put_super(), where we hold the sb_lock. Therefore we destroy
+* the lru lists right now.
+*/
+   list_lru_destroy(>s_dentry_lru);
+   list_lru_destroy(>s_inode_lru);
+
+   put_filesystem(fs);
+   put_super(s);
+}
 EXPORT_SYMBOL(deactivate_locked_super);
 
 /**
-- 
2.39.2



[PATCH 18/19] fs: simple ->shutdown_sb and ->free_sb conversions

2023-09-13 Thread Christoph Hellwig
Convert all file systems that just called generic_shutdown_super from
->kill_sb without any state kept from before the call to after it to
->shutdown_sb and ->free_sb as needed.

Signed-off-by: Christoph Hellwig 
---
 fs/9p/vfs_super.c | 14 +++---
 fs/afs/super.c| 15 +++
 fs/autofs/autofs_i.h  |  3 ++-
 fs/autofs/init.c  |  3 ++-
 fs/autofs/inode.c | 24 +---
 fs/btrfs/super.c  | 11 ---
 fs/ceph/super.c   | 13 +
 fs/cramfs/inode.c |  6 ++
 fs/ecryptfs/main.c| 11 ++-
 fs/fuse/inode.c   |  7 +++
 fs/fuse/virtio_fs.c   | 19 +++
 fs/hostfs/hostfs_kern.c   |  5 ++---
 fs/nfs/fs_context.c   | 18 ++
 fs/orangefs/orangefs-kernel.h |  2 +-
 fs/orangefs/orangefs-mod.c|  2 +-
 fs/orangefs/super.c   | 11 ---
 fs/proc/root.c| 16 +---
 fs/romfs/super.c  |  6 ++
 fs/smb/client/cifsfs.c| 14 +-
 fs/ubifs/super.c  |  9 +++--
 20 files changed, 103 insertions(+), 106 deletions(-)

diff --git a/fs/9p/vfs_super.c b/fs/9p/vfs_super.c
index e8b3641c98f886..a238065dd8b361 100644
--- a/fs/9p/vfs_super.c
+++ b/fs/9p/vfs_super.c
@@ -205,25 +205,17 @@ static struct dentry *v9fs_mount(struct file_system_type 
*fs_type, int flags,
return ERR_PTR(retval);
 }
 
-/**
- * v9fs_kill_super - Kill Superblock
- * @s: superblock
- *
- */
-
-static void v9fs_kill_super(struct super_block *s)
+static void v9fs_free_sb(struct super_block *s)
 {
struct v9fs_session_info *v9ses = s->s_fs_info;
 
p9_debug(P9_DEBUG_VFS, " %p\n", s);
 
-   generic_shutdown_super(s);
-
v9fs_session_cancel(v9ses);
v9fs_session_close(v9ses);
kfree(v9ses);
s->s_fs_info = NULL;
-   p9_debug(P9_DEBUG_VFS, "exiting kill_super\n");
+   p9_debug(P9_DEBUG_VFS, "exiting free_sb\n");
 }
 
 static void
@@ -340,7 +332,7 @@ static const struct super_operations v9fs_super_ops_dotl = {
 struct file_system_type v9fs_fs_type = {
.name = "9p",
.mount = v9fs_mount,
-   .kill_sb = v9fs_kill_super,
+   .free_sb = v9fs_free_sb,
.owner = THIS_MODULE,
.fs_flags = FS_RENAME_DOES_D_MOVE,
 };
diff --git a/fs/afs/super.c b/fs/afs/super.c
index 84b135ad3496b1..bd85554056415d 100644
--- a/fs/afs/super.c
+++ b/fs/afs/super.c
@@ -30,7 +30,8 @@
 #include "internal.h"
 
 static void afs_i_init_once(void *foo);
-static void afs_kill_super(struct super_block *sb);
+static void afs_shutdown_sb(struct super_block *sb);
+static void afs_free_sb(struct super_block *sb);
 static struct inode *afs_alloc_inode(struct super_block *sb);
 static void afs_destroy_inode(struct inode *inode);
 static void afs_free_inode(struct inode *inode);
@@ -45,7 +46,8 @@ struct file_system_type afs_fs_type = {
.name   = "afs",
.init_fs_context= afs_init_fs_context,
.parameters = afs_fs_parameters,
-   .kill_sb= afs_kill_super,
+   .shutdown_sb= afs_shutdown_sb,
+   .free_sb= afs_free_sb,
.fs_flags   = FS_RENAME_DOES_D_MOVE,
 };
 MODULE_ALIAS_FS("afs");
@@ -527,7 +529,7 @@ static void afs_destroy_sbi(struct afs_super_info *as)
}
 }
 
-static void afs_kill_super(struct super_block *sb)
+static void afs_shutdown_sb(struct super_block *sb)
 {
struct afs_super_info *as = AFS_FS_S(sb);
 
@@ -539,7 +541,12 @@ static void afs_kill_super(struct super_block *sb)
 */
if (as->volume)
rcu_assign_pointer(as->volume->sb, NULL);
-   generic_shutdown_super(sb);
+}
+
+static void afs_free_sb(struct super_block *sb)
+{
+   struct afs_super_info *as = AFS_FS_S(sb);
+
if (as->volume)
afs_deactivate_volume(as->volume);
afs_destroy_sbi(as);
diff --git a/fs/autofs/autofs_i.h b/fs/autofs/autofs_i.h
index d5a44fa88acf9a..f60f425c08b55c 100644
--- a/fs/autofs/autofs_i.h
+++ b/fs/autofs/autofs_i.h
@@ -276,4 +276,5 @@ static inline void autofs_del_expiring(struct dentry 
*dentry)
}
 }
 
-void autofs_kill_sb(struct super_block *);
+void autofs_shutdown_sb(struct super_block *sb);
+void autofs_free_sb(struct super_block *sb);
diff --git a/fs/autofs/init.c b/fs/autofs/init.c
index d3f55e87433890..1f7bed5391f822 100644
--- a/fs/autofs/init.c
+++ b/fs/autofs/init.c
@@ -17,7 +17,8 @@ struct file_system_type autofs_fs_type = {
.owner  = THIS_MODULE,
.name   = "autofs",
.mount  = autofs_mount,
-   .kill_sb= autofs_kill_sb,
+   .shutdown_sb= autofs_shutdown_sb,
+   .free_sb= autofs_free_sb,
 };

split up ->kill_sb

2023-09-13 Thread Christoph Hellwig
Hi Al and Christian,

this series splits ->kill_sb into separate ->shutdown_sb and ->free_sb
methods and then calls generic_shutdown_super from common code to clean
up the file system shutdown interface.

As a first step towards that it moves allocating and freeing the
anonymous block device dev_t into common code. As every super_block must
have a valid s_dev it makes sense to just do that if the file system
didn't set one by itself, and we can also detect if one was assigned
easily when shutting down.

A git tree is available here:

git://git.infradead.org/users/hch/misc.git fs-kill_sb

Gitweb:

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/fs-kill_sb

Diffstat:
 Documentation/filesystems/locking.rst |9 -
 Documentation/filesystems/vfs.rst |   15 ++
 arch/powerpc/platforms/cell/spufs/inode.c |   10 -
 arch/s390/hypfs/inode.c   |   43 
 arch/x86/kernel/cpu/resctrl/rdtgroup.c|7 -
 block/bdev.c  |1 
 drivers/android/binderfs.c|   12 --
 drivers/base/devtmpfs.c   |8 -
 drivers/dax/super.c   |1 
 drivers/dma-buf/dma-buf.c |1 
 drivers/gpu/drm/drm_drv.c |1 
 drivers/infiniband/hw/qib/qib.h   |4 
 drivers/infiniband/hw/qib/qib_fs.c|  105 ---
 drivers/infiniband/hw/qib/qib_init.c  |   32 ++
 drivers/misc/cxl/api.c|1 
 drivers/misc/ibmasm/ibmasmfs.c|8 -
 drivers/mtd/mtdsuper.c|   12 --
 drivers/scsi/cxlflash/ocxl_hw.c   |1 
 drivers/usb/gadget/function/f_fs.c|6 -
 drivers/usb/gadget/legacy/inode.c |   18 +--
 drivers/xen/xenfs/super.c |8 -
 fs/9p/vfs_super.c |   16 ---
 fs/adfs/super.c   |2 
 fs/affs/super.c   |7 -
 fs/afs/super.c|   27 ++---
 fs/aio.c  |1 
 fs/anon_inodes.c  |1 
 fs/autofs/autofs_i.h  |3 
 fs/autofs/init.c  |3 
 fs/autofs/inode.c |   24 ++--
 fs/befs/linuxvfs.c|2 
 fs/bfs/inode.c|2 
 fs/binfmt_misc.c  |8 -
 fs/btrfs/super.c  |   16 +--
 fs/btrfs/tests/btrfs-tests.c  |1 
 fs/ceph/super.c   |   20 +--
 fs/coda/inode.c   |1 
 fs/configfs/mount.c   |8 -
 fs/cramfs/inode.c |6 -
 fs/debugfs/inode.c|8 -
 fs/devpts/inode.c |6 -
 fs/ecryptfs/main.c|   14 --
 fs/efivarfs/super.c   |   13 +-
 fs/efs/super.c|7 -
 fs/erofs/super.c  |   25 +---
 fs/exfat/super.c  |6 -
 fs/ext2/super.c   |2 
 fs/ext4/super.c   |   12 +-
 fs/f2fs/super.c   |6 -
 fs/fat/namei_msdos.c  |2 
 fs/fat/namei_vfat.c   |2 
 fs/freevxfs/vxfs_super.c  |2 
 fs/fuse/control.c |   12 +-
 fs/fuse/inode.c   |   19 +--
 fs/fuse/virtio_fs.c   |   21 ++-
 fs/gfs2/ops_fstype.c  |   11 --
 fs/hfs/super.c|2 
 fs/hfsplus/super.c|2 
 fs/hostfs/hostfs_kern.c   |5 
 fs/hpfs/super.c   |2 
 fs/hugetlbfs/inode.c  |2 
 fs/isofs/inode.c  |2 
 fs/jffs2/super.c  |   22 ++--
 fs/jfs/super.c|2 
 fs/kernfs/mount.c |   20 +--
 fs/minix/inode.c  |2 
 fs/nfs/client.c   |2 
 fs/nfs/fs_context.c   |   19 +++
 fs/nfs/internal.h |1 
 fs/nfs/nfs4proc.c |8 -
 fs/nfs/nfs4trace.h|6 -
 fs/nfs/nfs4xdr.c  |2 
 fs/nfs/super.c|   26 
 fs/nfs/sysfs.h|2 
 fs/nfsd/nfsctl.c  |   22 ++--
 fs/nilfs2/super.c |2 
 fs/nsfs.c |1 
 fs/ntfs/super.c   |2 
 fs/ntfs3/super.c  |6 -
 fs/ocfs2/dlmfs/dlmfs.c|2 
 fs/ocfs2/super.c  |2 
 fs/omfs/inode.c   |2 
 

Re: [PATCH v2] arch/cacheflush: Introduce flush_all_caches()

2022-08-22 Thread Christoph Hellwig
On Fri, Aug 19, 2022 at 10:10:24AM -0700, Davidlohr Bueso wrote:
> index b192d917a6d0..ac4d4fd4e508 100644
> --- a/arch/x86/include/asm/cacheflush.h
> +++ b/arch/x86/include/asm/cacheflush.h
> @@ -10,4 +10,8 @@
>  
>  void clflush_cache_range(void *addr, unsigned int size);
>  
> +/* see comments in the stub version */
> +#define flush_all_caches() \
> + do { wbinvd_on_all_cpus(); } while(0)

Yikes.  This is just a horrible, horrible name and placement for a bad
hack that should have no generic relevance.

Please fix up the naming to make it clear that this function is for a
very specific nvdimm use case, and move it to a nvdimm-specific header
file.



Re: [PATCH v2] pmem: fix a name collision

2022-06-30 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 



Re: [PATCH] pmem: fix a name collision

2022-06-30 Thread Christoph Hellwig
On Thu, Jun 30, 2022 at 11:51:55AM -0600, Jane Chu wrote:
> -static phys_addr_t to_phys(struct pmem_device *pmem, phys_addr_t offset)
> +static phys_addr_t _to_phys(struct pmem_device *pmem, phys_addr_t offset)

I'd rather call this pmem_to_phys as that is a much nicer name.



Re: [PATCH] xfs: fail dax mount if reflink is enabled on a partition

2022-06-09 Thread Christoph Hellwig
On Thu, Jun 09, 2022 at 10:34:35PM +0800, Shiyang Ruan wrote:
> Failure notification is not supported on partitions.  So, when we mount
> a reflink enabled xfs on a partition with dax option, let it fail with
> -EINVAL code.

Looks good:

Reviewed-by: Christoph Hellwig 



Re: [PATCHSETS] v14 fsdax-rmap + v11 fsdax-reflink

2022-05-10 Thread Christoph Hellwig
The patch numbering due looks odd due to the combination of the
two series.  But otherwise this looks good to me modulo the one
minor nitpick.



Re: [PATCH v11 06/07] xfs: support CoW in fsdax mode

2022-05-09 Thread Christoph Hellwig
> +#ifdef CONFIG_FS_DAX
> +int
> +xfs_dax_fault(
> + struct vm_fault *vmf,
> + enum page_entry_sizepe_size,
> + boolwrite_fault,
> + pfn_t   *pfn)
> +{
> + return dax_iomap_fault(vmf, pe_size, pfn, NULL,
> + (write_fault && !vmf->cow_page) ?
> + _dax_write_iomap_ops :
> + _read_iomap_ops);
> +}
> +#endif

Is there any reason this is in xfs_iomap.c and not xfs_file.c?

Otherwise the patch looks good:


Reviewed-by: Christoph Hellwig 



Re: [PATCH v13 0/7] fsdax: introduce fs query to support reflink

2022-04-20 Thread Christoph Hellwig
On Thu, Apr 21, 2022 at 02:35:02PM +1000, Dave Chinner wrote:
> Sure, I'm not a maintainer and just the stand-in patch shepherd for
> a single release. However, being unable to cleanly merge code we
> need integrated into our local subsystem tree for integration
> testing because a patch dependency with another subsystem won't gain
> a stable commit ID until the next merge window is  distinctly
> suboptimal.

Yes.  Which is why we've taken a lot of mm patchs through other trees,
sometimes specilly crafted for that.  So I guess in this case we'll
just need to take non-trivial dependencies into the XFS tree, and just
deal with small merge conflicts for the trivial ones.



Re: [PATCH v13 7/7] fsdax: set a CoW flag when associate reflink mappings

2022-04-19 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 



Re: [PATCH v12 6/7] xfs: Implement ->notify_failure() for XFS

2022-04-13 Thread Christoph Hellwig
On Wed, Apr 13, 2022 at 10:09:40AM -0700, Dan Williams wrote:
> Yes, sounds like we're on the same page. I had mistakenly interpreted
> "Hence these notifications need to be delayed until after the
> filesystem is mounted" as something the producer would need to handle,
> but yes, consumer is free to drop if the notification arrives at an
> inopportune time.

A SB_BORN check might be all that we need.



Re: [RFC PATCH] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind

2022-04-11 Thread Christoph Hellwig
On Mon, Apr 11, 2022 at 01:16:23AM +0800, Shiyang Ruan wrote:
> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> index bd502957cfdf..72d9e69aea98 100644
> --- a/drivers/nvdimm/pmem.c
> +++ b/drivers/nvdimm/pmem.c
> @@ -359,7 +359,6 @@ static void pmem_release_disk(void *__pmem)
>   struct pmem_device *pmem = __pmem;
>  
>   dax_remove_host(pmem->disk);
> - kill_dax(pmem->dax_dev);
>   put_dax(pmem->dax_dev);
>   del_gendisk(pmem->disk);
>  
> @@ -597,6 +596,8 @@ static void nd_pmem_remove(struct device *dev)
>   pmem->bb_state = NULL;
>   }
>   nvdimm_flush(to_nd_region(dev->parent), NULL);
> +
> + kill_dax(pmem->dax_dev);

I think the put_dax will have to move as well.

This part should probably also be a separate, well-documented
cleanup patch.



Re: [PATCH v12 7/7] fsdax: set a CoW flag when associate reflink mappings

2022-04-11 Thread Christoph Hellwig
> + * Set or Update the page->mapping with FS_DAX_MAPPING_COW flag.
> + * Return true if it is an Update.
> + */
> +static inline bool dax_mapping_set_cow(struct page *page)
> +{
> + if (page->mapping) {
> + /* flag already set */
> + if (dax_mapping_is_cow(page->mapping))
> + return false;
> +
> + /*
> +  * This page has been mapped even before it is shared, just
> +  * need to set this FS_DAX_MAPPING_COW flag.
> +  */
> + dax_mapping_set_cow_flag(>mapping);
> + return true;
> + }
> + /* Newly associate CoW mapping */
> + dax_mapping_set_cow_flag(>mapping);
> + return false;

Given that this is the only place calling dax_mapping_set_cow I wonder
if we should just open code it here, and also lift the page->index logic
from the caller into this helper.

static inline void dax_mapping_set_cow(struct page *page)
{
if ((uintptr_t)page->mapping != PAGE_MAPPING_DAX_COW) {
/*
 * Reset the index if the page was already mapped
 * regularly before.
 */
if (page->mapping)
page->index = 1;
page->mapping = (void *)PAGE_MAPPING_DAX_COW;
}
page->index++;
}

> + if (!dax_mapping_is_cow(page->mapping)) {
> + /* keep the CoW flag if this page is still shared */
> + if (page->index-- > 0)
> + continue;
> + } else
> + WARN_ON_ONCE(page->mapping && page->mapping != mapping);

Isnt the dax_mapping_is_cow check above inverted?



Re: [PATCH v12 6/7] xfs: Implement ->notify_failure() for XFS

2022-04-11 Thread Christoph Hellwig
> --- a/fs/xfs/xfs_super.h
> +++ b/fs/xfs/xfs_super.h
> @@ -93,6 +93,7 @@ extern xfs_agnumber_t xfs_set_inode_alloc(struct xfs_mount 
> *,
>  extern const struct export_operations xfs_export_operations;
>  extern const struct xattr_handler *xfs_xattr_handlers[];
>  extern const struct quotactl_ops xfs_quotactl_operations;


> +extern const struct dax_holder_operations xfs_dax_holder_operations;

This needs to be defined to NULL if at least one of CONFIG_FS_DAX or
CONFIG_MEMORY_FAILURE is not set.

Otherwise looks good:

Reviewed-by: Christoph Hellwig 



Re: [PATCH v12 2/7] mm: factor helpers for memory_failure_dev_pagemap

2022-04-11 Thread Christoph Hellwig
> + unmap_and_kill(_kill, pfn, page->mapping, page->index, flags);
> +unlock:
> + dax_unlock_page(page, cookie);
> + return 0;

As the buildbot points out this should probably be a "return rc".



Re: [PATCH v12 1/7] dax: Introduce holder for dax_device

2022-04-11 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 



Re: [PATCH v11 7/8] xfs: Implement ->notify_failure() for XFS

2022-03-30 Thread Christoph Hellwig
On Wed, Mar 30, 2022 at 11:16:10PM +0800, Shiyang Ruan wrote:
> > > +#if IS_ENABLED(CONFIG_MEMORY_FAILURE) && IS_ENABLED(CONFIG_FS_DAX)
> > 
> > No real need for the IS_ENABLED.  Also any reason to even build this
> > file if the options are not set?  It seems like
> > xfs_dax_holder_operations should just be defined to NULL and the
> > whole file not supported if we can't support the functionality.
> 
> Got it.  These two CONFIG seem not related for now.  So, I think I should
> wrap these code with #ifdef CONFIG_MEMORY_FAILURE here, and add
> `xfs-$(CONFIG_FS_DAX) += xfs_notify_failure.o` in the makefile.

I'd do

ifeq ($(CONFIG_MEMORY_FAILURE),y)
xfs-$(CONFIG_FS_DAX) += xfs_notify_failure.o
endif

in the makefile and keep it out of the actual source file entirely.

> > > +
> > > + /* Ignore the range out of filesystem area */
> > > + if ((offset + len) < ddev_start)
> > 
> > No need for the inner braces.
> > 
> > > + if ((offset + len) > ddev_end)
> > 
> > No need for the braces either.
> 
> Really no need?  It is to make sure the range to be handled won't out of the
> filesystem area.  And make sure the @offset and @len are valid and correct
> after subtract the bbdev_start.

Yes, but there is no need for the braces per the precedence rules in
C.  So these could be:

if (offset + len < ddev_start)

and

if (offset + len > ddev_end)



Re: [PATCH v11 1/8] dax: Introduce holder for dax_device

2022-03-30 Thread Christoph Hellwig
On Wed, Mar 30, 2022 at 06:58:21PM +0800, Shiyang Ruan wrote:
> As the code I pasted before, pmem driver will subtract its ->data_offset,
> which is byte-based. And the filesystem who implements ->notify_failure()
> will calculate the offset in unit of byte again.
> 
> So, leave its function signature byte-based, to avoid repeated conversions.

I'm actually fine either way, so I'll wait for Dan to comment.



Re: [PATCH v11 1/8] dax: Introduce holder for dax_device

2022-03-30 Thread Christoph Hellwig
On Wed, Mar 30, 2022 at 06:03:01PM +0800, Shiyang Ruan wrote:
> 
> Because I am not sure if the offset between each layer is page aligned.  For
> example, when pmem dirver handles ->memory_failure(), it should subtract its
> ->data_offset when it calls dax_holder_notify_failure().

If they aren't, none of the DAX machinery would work.



Re: [PATCH v6 3/6] mm: rmap: introduce pfn_mkclean_range() to cleans PTEs

2022-03-30 Thread Christoph Hellwig
On Wed, Mar 30, 2022 at 03:31:37PM +0800, Muchun Song wrote:
> I saw Shiyang is ready to rebase onto this patch.  So should I
> move it to linux/mm.h or let Shiyang does?

Good question.  I think Andrew has this series in -mm and ready to go
to Linus, so maybe it is best if we don't change too much.

Andrew, can you just fold in the trivial comment fix?



Re: [PATCH v11 7/8] xfs: Implement ->notify_failure() for XFS

2022-03-30 Thread Christoph Hellwig
> @@ -1892,6 +1893,8 @@ xfs_free_buftarg(
>   list_lru_destroy(>bt_lru);
>  
>   blkdev_issue_flush(btp->bt_bdev);
> + if (btp->bt_daxdev)
> + dax_unregister_holder(btp->bt_daxdev, btp->bt_mount);
>   fs_put_dax(btp->bt_daxdev);
>  
>   kmem_free(btp);
> @@ -1939,6 +1942,7 @@ xfs_alloc_buftarg(
>   struct block_device *bdev)
>  {
>   xfs_buftarg_t   *btp;
> + int error;
>  
>   btp = kmem_zalloc(sizeof(*btp), KM_NOFS);
>  
> @@ -1946,6 +1950,14 @@ xfs_alloc_buftarg(
>   btp->bt_dev =  bdev->bd_dev;
>   btp->bt_bdev = bdev;
>   btp->bt_daxdev = fs_dax_get_by_bdev(bdev, >bt_dax_part_off);
> + if (btp->bt_daxdev) {
> + error = dax_register_holder(btp->bt_daxdev, mp,
> + _dax_holder_operations);
> + if (error) {
> + xfs_err(mp, "DAX device already in use?!");
> + goto error_free;
> + }
> + }

It seems to me that just passing the holder and holder ops to
fs_dax_get_by_bdev and the holder to dax_unregister_holder would
significantly simply the interface here.

Dan, what do you think?

> +#if IS_ENABLED(CONFIG_MEMORY_FAILURE) && IS_ENABLED(CONFIG_FS_DAX)

No real need for the IS_ENABLED.  Also any reason to even build this
file if the options are not set?  It seems like
xfs_dax_holder_operations should just be defined to NULL and the
whole file not supported if we can't support the functionality.

Dan: not for this series, but is there any reason not to require
MEMORY_FAILURE for DAX to start with?

> +
> + ddev_start = mp->m_ddev_targp->bt_dax_part_off;
> + ddev_end = ddev_start +
> + (mp->m_ddev_targp->bt_bdev->bd_nr_sectors << SECTOR_SHIFT) - 1;

This should use bdev_nr_bytes.

But didn't we say we don't want to support notifications on partitioned
devices and thus don't actually need all this?

> +
> + /* Ignore the range out of filesystem area */
> + if ((offset + len) < ddev_start)

No need for the inner braces.

> + if ((offset + len) > ddev_end)

No need for the braces either.

> diff --git a/fs/xfs/xfs_notify_failure.h b/fs/xfs/xfs_notify_failure.h
> new file mode 100644
> index ..76187b9620f9
> --- /dev/null
> +++ b/fs/xfs/xfs_notify_failure.h
> @@ -0,0 +1,10 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (c) 2022 Fujitsu.  All Rights Reserved.
> + */
> +#ifndef __XFS_NOTIFY_FAILURE_H__
> +#define __XFS_NOTIFY_FAILURE_H__
> +
> +extern const struct dax_holder_operations xfs_dax_holder_operations;
> +
> +#endif  /* __XFS_NOTIFY_FAILURE_H__ */

Dowe really need a new header for this vs just sequeezing it into
xfs_super.h or something like that?

> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index e8f37bdc8354..b8de6ed2c888 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -353,6 +353,12 @@ xfs_setup_dax_always(
>   return -EINVAL;
>   }
>  
> + if (xfs_has_reflink(mp) && !xfs_has_rmapbt(mp)) {
> + xfs_alert(mp,
> + "need rmapbt when both DAX and reflink enabled.");
> + return -EINVAL;
> + }

Right now we can't even enable reflink with DAX yet, so adding this
here seems premature - it should go into the patch allowing DAX+reflink.




Re: [PATCH v11 6/8] mm: Introduce mf_dax_kill_procs() for fsdax case

2022-03-29 Thread Christoph Hellwig
On Sun, Feb 27, 2022 at 08:07:45PM +0800, Shiyang Ruan wrote:
> This function is called at the end of RMAP routine, i.e. filesystem
> recovery function, to collect and kill processes using a shared page of
> DAX file.

I think just throwing RMAP inhere is rather confusing.

> The difference with mf_generic_kill_procs() is, it accepts
> file's (mapping,offset) instead of struct page because different files'
> mappings and offsets may share the same page in fsdax mode.
> It will be called when filesystem's RMAP results are found.

So maybe I'd word the whole log as something like:

This new function is a variant of mf_generic_kill_procs that accepts
a file, offset pair instead o a struct to support multiple files sharing
a DAX mapping.  It is intended to be called by the file systems as
part of the memory_failure handler after the file system performed
a reverse mapping from the storage address to the file and file offset.

Otherwise looks good:

Reviewed-by: Christoph Hellwig 

> index 9b1d56c5c224..0420189e4788 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3195,6 +3195,10 @@ enum mf_flags {
>   MF_SOFT_OFFLINE = 1 << 3,
>   MF_UNPOISON = 1 << 4,
>  };
> +#if IS_ENABLED(CONFIG_FS_DAX)
> +int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index,
> +   unsigned long count, int mf_flags);
> +#endif /* CONFIG_FS_DAX */

No need for the ifdef here, having the stable declaration around is
just fine.

> +#if IS_ENABLED(CONFIG_FS_DAX)

No need for the IS_ENABLED as CONFIG_FS_DAX can't be modular.
A good old #ifdef will do it.

Otherwise looks good:

Reviewed-by: Christoph Hellwig 



Re: [PATCH v6 3/6] mm: rmap: introduce pfn_mkclean_range() to cleans PTEs

2022-03-29 Thread Christoph Hellwig
On Tue, Mar 29, 2022 at 09:48:50PM +0800, Muchun Song wrote:
> + * * Return the start of user virtual address at the specific offset within

Double "*" here.

Also Shiyang has been wanting a quite similar vma_pgoff_address for use
in dax.c.  Maybe we'll need to look into moving this to linux/mm.h.

>  static inline unsigned long
> -vma_address(struct page *page, struct vm_area_struct *vma)
> +vma_pgoff_address(pgoff_t pgoff, unsigned long nr_pages,
> +   struct vm_area_struct *vma)
>  {
> - pgoff_t pgoff;
>   unsigned long address;
>  
> - VM_BUG_ON_PAGE(PageKsm(page), page);/* KSM page->index unusable */
> - pgoff = page_to_pgoff(page);
>   if (pgoff >= vma->vm_pgoff) {
>   address = vma->vm_start +
>   ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
>   /* Check for address beyond vma (or wrapped through 0?) */
>   if (address < vma->vm_start || address >= vma->vm_end)
>   address = -EFAULT;
> - } else if (PageHead(page) &&
> -pgoff + compound_nr(page) - 1 >= vma->vm_pgoff) {
> + } else if (pgoff + nr_pages - 1 >= vma->vm_pgoff) {
>   /* Test above avoids possibility of wrap to 0 on 32-bit */
>   address = vma->vm_start;
>   } else {
> @@ -545,6 +541,18 @@ vma_address(struct page *page, struct vm_area_struct 
> *vma)
>  }
>  
>  /*
> + * Return the start of user virtual address of a page within a vma.
> + * Returns -EFAULT if all of the page is outside the range of vma.
> + * If page is a compound head, the entire compound page is considered.
> + */
> +static inline unsigned long
> +vma_address(struct page *page, struct vm_area_struct *vma)
> +{
> + VM_BUG_ON_PAGE(PageKsm(page), page);/* KSM page->index unusable */
> + return vma_pgoff_address(page_to_pgoff(page), compound_nr(page), vma);
> +}
> +
> +/*
>   * Then at what user virtual address will none of the range be found in vma?
>   * Assumes that vma_address() already returned a good starting address.
>   */
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 723682ddb9e8..ad5cf0e45a73 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -929,12 +929,12 @@ int folio_referenced(struct folio *folio, int is_locked,
>   return pra.referenced;
>  }
>  
> -static bool page_mkclean_one(struct folio *folio, struct vm_area_struct *vma,
> - unsigned long address, void *arg)
> +static int page_vma_mkclean_one(struct page_vma_mapped_walk *pvmw)
>  {
> - DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, PVMW_SYNC);
> + int cleaned = 0;
> + struct vm_area_struct *vma = pvmw->vma;
>   struct mmu_notifier_range range;
> - int *cleaned = arg;
> + unsigned long address = pvmw->address;
>  
>   /*
>* We have to assume the worse case ie pmd for invalidation. Note that
> @@ -942,16 +942,16 @@ static bool page_mkclean_one(struct folio *folio, 
> struct vm_area_struct *vma,
>*/
>   mmu_notifier_range_init(, MMU_NOTIFY_PROTECTION_PAGE,
>   0, vma, vma->vm_mm, address,
> - vma_address_end());
> + vma_address_end(pvmw));
>   mmu_notifier_invalidate_range_start();
>  
> - while (page_vma_mapped_walk()) {
> + while (page_vma_mapped_walk(pvmw)) {
>   int ret = 0;
>  
> - address = pvmw.address;
> - if (pvmw.pte) {
> + address = pvmw->address;
> + if (pvmw->pte) {
>   pte_t entry;
> - pte_t *pte = pvmw.pte;
> + pte_t *pte = pvmw->pte;
>  
>   if (!pte_dirty(*pte) && !pte_write(*pte))
>   continue;
> @@ -964,7 +964,7 @@ static bool page_mkclean_one(struct folio *folio, struct 
> vm_area_struct *vma,
>   ret = 1;
>   } else {
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> - pmd_t *pmd = pvmw.pmd;
> + pmd_t *pmd = pvmw->pmd;
>   pmd_t entry;
>  
>   if (!pmd_dirty(*pmd) && !pmd_write(*pmd))
> @@ -991,11 +991,22 @@ static bool page_mkclean_one(struct folio *folio, 
> struct vm_area_struct *vma,
>* See Documentation/vm/mmu_notifier.rst
>*/
>   if (ret)
> - (*cleaned)++;
> + cleaned++;
>   }
>  
>   mmu_notifier_invalidate_range_end();
>  
> + return cleaned;
> +}
> +
> +static bool page_mkclean_one(struct folio *folio, struct vm_area_struct *vma,
> +  unsigned long address, void *arg)
> +{
> + DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, PVMW_SYNC);
> + int *cleaned = arg;
> +
> + *cleaned += page_vma_mkclean_one();
> +
>   return true;
>  }
>  
> @@ -1033,6 +1044,38 @@ int folio_mkclean(struct folio *folio)
>  EXPORT_SYMBOL_GPL(folio_mkclean);
>  
>  /**
> 

Re: [PATCH v11 5/8] mm: move pgoff_address() to vma_pgoff_address()

2022-03-29 Thread Christoph Hellwig
On Sun, Feb 27, 2022 at 08:07:44PM +0800, Shiyang Ruan wrote:
> Since it is not a DAX-specific function, move it into mm and rename it
> to be a generic helper.

FYI, there is a patch in -mm and linux-next:

  "mm: rmap: introduce pfn_mkclean_range() to cleans PTEs"

that adds a vma_pgoff_address which seems like a bit of a superset of
the one added in this patch, but only is in mm/internal.h.



Re: [PATCH v11 1/8] dax: Introduce holder for dax_device

2022-03-29 Thread Christoph Hellwig
On Wed, Mar 16, 2022 at 09:46:07PM +0800, Shiyang Ruan wrote:
> > Forgive me if this has been discussed before, but since dax_operations
> > are in terms of pgoff and nr pages and memory_failure() is in terms of
> > pfns what was the rationale for making the function signature byte
> > based?
> 
> Maybe I didn't describe it clearly...  The @offset and @len here are
> byte-based.  And so is ->memory_failure().

Yes, but is there a good reason for that when the rest of the DAX code
tends to work in page chunks?



Re: [PATCH v11 1/8] dax: Introduce holder for dax_device

2022-03-29 Thread Christoph Hellwig
On Fri, Mar 11, 2022 at 03:35:13PM -0800, Dan Williams wrote:
> > +   if (!dax_dev->holder_ops) {
> > +   rc = -EOPNOTSUPP;
> 
> I think it is ok to return success (0) for this case. All the caller
> of dax_holder_notify_failure() wants to know is if the notification
> was successfully delivered to the holder. If there is no holder
> present then there is nothing to report. This is minor enough for me
> to fix up locally if nothing else needs to be changed.

The caller needs to know there are no holder ops to fall back to
different path.

> Isn't this another failure scenario? If kill_dax() is called while a
> holder is still holding the dax_device that seems to be another
> ->notify_failure scenario to tell the holder that the device is going
> away and the holder has not released the device yet.

Yes.



Re: [PATCH v5 5/6] dax: fix missing writeprotect the pte entry

2022-03-22 Thread Christoph Hellwig
> +static void dax_entry_mkclean(struct address_space *mapping, unsigned long 
> pfn,
> +   unsigned long npfn, pgoff_t start)
>  {
>   struct vm_area_struct *vma;
> + pgoff_t end = start + npfn - 1;
>  
>   i_mmap_lock_read(mapping);
> + vma_interval_tree_foreach(vma, >i_mmap, start, end) {
> + pfn_mkclean_range(pfn, npfn, start, vma);
>   cond_resched();
>   }
>   i_mmap_unlock_read(mapping);


Is there any point in even keeping this helper vs just open coding it
in the only caller below?

Otherwise looks good:

Reviewed-by: Christoph Hellwig 



Re: [PATCH v5 6/6] mm: simplify follow_invalidate_pte()

2022-03-22 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 



Re: [PATCH v5 2/6] dax: fix cache flush on PMD-mapped pages

2022-03-22 Thread Christoph Hellwig
On Fri, Mar 18, 2022 at 03:45:25PM +0800, Muchun Song wrote:
> The flush_cache_page() only remove a PAGE_SIZE sized range from the cache.
> However, it does not cover the full pages in a THP except a head page.
> Replace it with flush_cache_range() to fix this issue.  This is just a
> documentation issue with the respect to properly documenting the expected
> usage of cache flushing before modifying the pmd.  However, in practice
> this is not a problem due to the fact that DAX is not available on
> architectures with virtually indexed caches per:
> 
>   commit d92576f1167c ("dax: does not work correctly with virtual aliasing 
> caches")
> 
> Fixes: f729c8c9b24f ("dax: wrprotect pmd_t in dax_mapping_entry_mkclean")
> Signed-off-by: Muchun Song 
> Reviewed-by: Dan Williams 

Looks good:

Reviewed-by: Christoph Hellwig 



Re: [PATCH v5 1/6] mm: rmap: fix cache flush on THP pages

2022-03-22 Thread Christoph Hellwig
On Fri, Mar 18, 2022 at 03:45:24PM +0800, Muchun Song wrote:
> The flush_cache_page() only remove a PAGE_SIZE sized range from the cache.
> However, it does not cover the full pages in a THP except a head page.
> Replace it with flush_cache_range() to fix this issue. At least, no
> problems were found due to this. Maybe because the architectures that
> have virtual indexed caches is less.
> 
> Fixes: f27176cfc363 ("mm: convert page_mkclean_one() to use 
> page_vma_mapped_walk()")
> Signed-off-by: Muchun Song 
> Reviewed-by: Yang Shi 
> Reviewed-by: Dan Williams 

Looks good:

Reviewed-by: Christoph Hellwig 



Re: [PATCH] dax: make sure inodes are flushed before destroy cache

2022-02-14 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 



Re: [PATCH v10 4/9] fsdax: fix function description

2022-02-02 Thread Christoph Hellwig
Dan, can you send this to Linus for 5.17 to get it out of the queue?



Re: [PATCH v10 1/9] dax: Introduce holder for dax_device

2022-02-02 Thread Christoph Hellwig
On Thu, Jan 27, 2022 at 08:40:50PM +0800, Shiyang Ruan wrote:
> +void dax_register_holder(struct dax_device *dax_dev, void *holder,
> + const struct dax_holder_operations *ops)
> +{
> + if (!dax_alive(dax_dev))
> + return;
> +
> + dax_dev->holder_data = holder;
> + dax_dev->holder_ops = ops;

This needs to return an error if there is another holder already.  And
some kind of locking to prevent concurrent registrations.

Also please add kerneldoc comments for the new exported functions.

> +void *dax_get_holder(struct dax_device *dax_dev)
> +{
> + if (!dax_alive(dax_dev))
> + return NULL;
> +
> + return dax_dev->holder_data;
> +}
> +EXPORT_SYMBOL_GPL(dax_get_holder);

get tends to imply getting a reference.  Maybe just dax_holder()?
That being said I can't see where we'd even want to use the holder
outside of this file.



Re: [PATCH 4/5] dax: fix missing writeprotect the pte entry

2022-01-23 Thread Christoph Hellwig
On Fri, Jan 21, 2022 at 03:55:14PM +0800, Muchun Song wrote:
> Reuse some infrastructure of page_mkclean_one() to let DAX can handle
> similar case to fix this issue.

Can you split out some of the infrastructure changes into proper
well-documented preparation patches?

> + pgoff_t pgoff_end = pgoff_start + npfn - 1;
>  
>   i_mmap_lock_read(mapping);
> - vma_interval_tree_foreach(vma, >i_mmap, index, index) {
> - struct mmu_notifier_range range;
> - unsigned long address;
> -
> + vma_interval_tree_foreach(vma, >i_mmap, pgoff_start, 
> pgoff_end) {

Please avoid the overly long lines here.  Just using start and end
might be an easy option.




Re: [PATCH 3/5] mm: page_vma_mapped: support checking if a pfn is mapped into a vma

2022-01-23 Thread Christoph Hellwig
On Fri, Jan 21, 2022 at 03:55:13PM +0800, Muchun Song wrote:
> + if (pvmw->pte && ((pvmw->flags & PVMW_PFN_WALK) || 
> !PageHuge(pvmw->page)))

Please avoid the overly long line here and in a few other places.

> +/*
> + * Then at what user virtual address will none of the page be found in vma?

Doesn't parse, what is this trying to say?



Re: [PATCH 2/5] dax: fix cache flush on PMD-mapped pages

2022-01-23 Thread Christoph Hellwig
On Fri, Jan 21, 2022 at 03:55:12PM +0800, Muchun Song wrote:
> The flush_cache_page() only remove a PAGE_SIZE sized range from the cache.
> However, it does not cover the full pages in a THP except a head page.
> Replace it with flush_cache_range() to fix this issue.
> 
> Fixes: f729c8c9b24f ("dax: wrprotect pmd_t in dax_mapping_entry_mkclean")
> Signed-off-by: Muchun Song 
> ---
>  fs/dax.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index 88be1c02a151..2955ec65eb65 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -857,7 +857,7 @@ static void dax_entry_mkclean(struct address_space 
> *mapping, pgoff_t index,
>   if (!pmd_dirty(*pmdp) && !pmd_write(*pmdp))
>   goto unlock_pmd;
>  
> - flush_cache_page(vma, address, pfn);
> + flush_cache_range(vma, address, address + 
> HPAGE_PMD_SIZE);

Same comment as for the previous one.




Re: [PATCH 1/5] mm: rmap: fix cache flush on THP pages

2022-01-23 Thread Christoph Hellwig
On Fri, Jan 21, 2022 at 03:55:11PM +0800, Muchun Song wrote:
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/rmap.c b/mm/rmap.c
> index b0fd9dc19eba..65670cb805d6 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -974,7 +974,7 @@ static bool page_mkclean_one(struct page *page, struct 
> vm_area_struct *vma,
>   if (!pmd_dirty(*pmd) && !pmd_write(*pmd))
>   continue;
>  
> - flush_cache_page(vma, address, page_to_pfn(page));
> + flush_cache_range(vma, address, address + 
> HPAGE_PMD_SIZE);

Do we need a flush_cache_folio here given that we must be dealing with
what effectively is a folio here?

Also please avoid the overly long line.




Re: [PATCH v9 02/10] dax: Introduce holder for dax_device

2022-01-20 Thread Christoph Hellwig
On Thu, Jan 20, 2022 at 06:22:00PM -0800, Darrick J. Wong wrote:
> Hm, so that means XFS can only support dax+pmem when there aren't
> partitions in use?  Ew.

Yes.  Or any sensible DAX usage going forward for that matter.

> 
> > >   (2) extent the holder mechanism to cover a rangeo
> 
> I don't think I was around for the part where "hch balked at a notifier
> call chain" -- what were the objections there, specifically?  I would
> hope that pmem problems would be infrequent enough that the locking
> contention (or rcu expiration) wouldn't be an issue...?

notifiers are a nightmare untype API leading to tons of boilerplate
code.  Open coding the notification is almost always a better idea.



Re: [PATCH v9 10/10] fsdax: set a CoW flag when associate reflink mappings

2022-01-20 Thread Christoph Hellwig
On Fri, Jan 21, 2022 at 10:33:58AM +0800, Shiyang Ruan wrote:
> > 
> > But different question, how does this not conflict with:
> > 
> > #define PAGE_MAPPING_ANON   0x1
> > 
> > in page-flags.h?
> 
> Now we are treating dax pages, so I think its flags should be different from
> normal page.  In another word, PAGE_MAPPING_ANON is a flag of rmap mechanism
> for normal page, it doesn't work for dax page.  And now, we have dax rmap
> for dax page.  So, I think this two kinds of flags are supposed to be used
> in different mechanisms and won't conflect.

It just needs someone to use folio_test_anon in a place where a DAX
folio can be passed.  This probably should not happen, but we need to
clearly document that.

> > Either way I think this flag should move to page-flags.h and be
> > integrated with the PAGE_MAPPING_FLAGS infrastucture.
> 
> And that's why I keep them in this dax.c file.

But that does not integrate it with the infrastructure.  For people
to debug things it needs to be next to PAGE_MAPPING_ANON and have
documentation explaining why they are exclusive.



Re: [PATCH v9 10/10] fsdax: set a CoW flag when associate reflink mappings

2022-01-20 Thread Christoph Hellwig
On Sun, Dec 26, 2021 at 10:34:39PM +0800, Shiyang Ruan wrote:
> +#define FS_DAX_MAPPING_COW   1UL
> +
> +#define MAPPING_SET_COW(m)   (m = (struct address_space *)FS_DAX_MAPPING_COW)
> +#define MAPPING_TEST_COW(m)  (((unsigned long)m & FS_DAX_MAPPING_COW) == \
> + FS_DAX_MAPPING_COW)

These really should be inline functions and probably use lower case
names.

But different question, how does this not conflict with:

#define PAGE_MAPPING_ANON   0x1

in page-flags.h?

Either way I think this flag should move to page-flags.h and be
integrated with the PAGE_MAPPING_FLAGS infrastucture.



Re: [PATCH v9 08/10] mm: Introduce mf_dax_kill_procs() for fsdax case

2022-01-20 Thread Christoph Hellwig
Please only build the new DAX code if CONFIG_FS_DAX is set.



Re: [PATCH v9 07/10] mm: move pgoff_address() to vma_pgoff_address()

2022-01-20 Thread Christoph Hellwig
On Sun, Dec 26, 2021 at 10:34:36PM +0800, Shiyang Ruan wrote:
> Since it is not a DAX-specific function, move it into mm and rename it
> to be a generic helper.
> 
> Signed-off-by: Shiyang Ruan 

Looks good,

Reviewed-by: Christoph Hellwig 



Re: [PATCH v9 05/10] fsdax: fix function description

2022-01-20 Thread Christoph Hellwig
On Sun, Dec 26, 2021 at 10:34:34PM +0800, Shiyang Ruan wrote:
> The function name has been changed, so the description should be updated
> too.
> 
> Signed-off-by: Shiyang Ruan 

Looks good,

Reviewed-by: Christoph Hellwig 

Dan, can you send this to Linux for 5.17 so that we can get it out of
the way?



Re: [PATCH v9 02/10] dax: Introduce holder for dax_device

2022-01-20 Thread Christoph Hellwig
On Wed, Jan 05, 2022 at 04:12:04PM -0800, Dan Williams wrote:
> We ended up with explicit callbacks after hch balked at a notifier
> call-chain, but I think we're back to that now. The partition mistake
> might be unfixable, but at least bdev_dax_pgoff() is dead. Notifier
> call chains have their own locking so, Ruan, this still does not need
> to touch dax_read_lock().

I think we have a few options here:

 (1) don't allow error notifications on partitions.  And error return from
 the holder registration with proper error handling in the file
 system would give us that
 (2) extent the holder mechanism to cover a range
 (3) bite the bullet and create a new stacked dax_device for each
 partition

I think (1) is the best option for now.  If people really do need
partitions we'll have to go for (3)



Re: [PATCH v8 7/9] dax: add dax holder helper for filesystems

2021-12-15 Thread Christoph Hellwig
On Wed, Dec 15, 2021 at 10:21:00AM +0800, Shiyang Ruan wrote:
> 
> 
> 在 2021/12/14 23:47, Christoph Hellwig 写道:
> > On Thu, Dec 02, 2021 at 04:48:54PM +0800, Shiyang Ruan wrote:
> > > Add these helper functions, and export them for filesystem use.
> > 
> > What is the point of adding these wrappers vs just calling the
> > underlying functions?
> 
> I added them so that they can be called in a friendly way, even if
> CONFIG_DAX is off.  Otherwise, we need #if IS_ENABLED(CONFIG_DAX) to wrap
> them where they are called.

No need for wrappers, you can stub out the underlying functions as well.



Re: [PATCH v8 1/9] dax: Use percpu rwsem for dax_{read,write}_lock()

2021-12-15 Thread Christoph Hellwig
On Wed, Dec 15, 2021 at 10:06:29AM +0800, Shiyang Ruan wrote:
> 
> 
> 在 2021/12/14 23:40, Christoph Hellwig 写道:
> > On Thu, Dec 02, 2021 at 04:48:48PM +0800, Shiyang Ruan wrote:
> > > In order to introduce dax holder registration, we need a write lock for
> > > dax.  Change the current lock to percpu_rw_semaphore and introduce a
> > > write lock for registration.
> > 
> > Why do we need to change the existing, global locking for that?
> 
> I think we have talked about this in the previous v7 patchset:
> 
> 
> https://lore.kernel.org/nvdimm/20210924130959.2695749-1-ruansy.f...@fujitsu.com/T/#m4031bc3dc49dcbaac6f8d99877f910fa9a6f998a

Any kind of rationale needs to go into the patch description.

> I didn't test in benchmarks for now.  Could you show me which one I should
> test this code on?  I am not familiar with this...

Just normal read/write I/O on a DAX device.



Re: [PATCH v8 8/9] xfs: Implement ->notify_failure() for XFS

2021-12-14 Thread Christoph Hellwig
> + // TODO check and try to fix metadata

Please avoid //-style comments.

> +static u64
> +xfs_dax_ddev_offset(
> + struct xfs_mount*mp,
> + struct dax_device   *dax_dev,
> + u64 disk_offset)
> +{
> + xfs_buftarg_t *targp;
> +
> + if (mp->m_ddev_targp->bt_daxdev == dax_dev)
> + targp = mp->m_ddev_targp;
> + else if (mp->m_logdev_targp->bt_daxdev == dax_dev)
> + targp = mp->m_logdev_targp;
> + else
> + targp = mp->m_rtdev_targp;
> +
> + return disk_offset - targp->bt_dax_part_off;

This is only called for the data device.  So I think we can simplify
this and open code the logic in xfs_dax_notify_ddev_failure.


> +void
> +xfs_notify_failure_register(
> + struct xfs_mount*mp,
> + struct dax_device   *dax_dev)
> +{
> + if (dax_dev && !fs_dax_get_holder(dax_dev))
> + fs_dax_register_holder(dax_dev, mp, _dax_holder_operations);
> +}
> +
> +void
> +xfs_notify_failure_unregister(
> + struct dax_device   *dax_dev)
> +{
> + if (dax_dev)
> + fs_dax_unregister_holder(dax_dev);
> +}

Why do we need these wrappers?  Also instead of the fs_dax_get_holder
here, fs_dax_register_holder needs to return an error if there already
is a holder.



Re: [PATCH v8 7/9] dax: add dax holder helper for filesystems

2021-12-14 Thread Christoph Hellwig
On Thu, Dec 02, 2021 at 04:48:54PM +0800, Shiyang Ruan wrote:
> Add these helper functions, and export them for filesystem use.

What is the point of adding these wrappers vs just calling the
underlying functions?



Re: [PATCH v8 6/9] mm: Introduce mf_dax_kill_procs() for fsdax case

2021-12-14 Thread Christoph Hellwig
On Thu, Dec 02, 2021 at 04:48:53PM +0800, Shiyang Ruan wrote:
> @@ -254,6 +254,15 @@ static inline bool dax_mapping(struct address_space 
> *mapping)
>  {
>   return mapping->host && IS_DAX(mapping->host);
>  }
> +static inline unsigned long pgoff_address(pgoff_t pgoff,
> + struct vm_area_struct *vma)

Empty lines between functions please.

Also this name is a bit generic for something in dax.h, but then again
it does not seem to be DAX-specific, so it might want to move into
a generic MM header with a proper name and kerneldoc comment.

I think a calling conventions that puts the vma before the pgoff would
seem a little more logical as well.

Last but not least such a move should be in a separate patch.

> +extern int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index,
> +  unsigned long count, int mf_flags);

No need for the extern here.

> -static unsigned long dev_pagemap_mapping_shift(struct page *page,
> +static unsigned long dev_pagemap_mapping_shift(unsigned long address,
>   struct vm_area_struct *vma)

Passing the vma first would seem more logical again.

> + if (is_zone_device_page(p)) {
> + /*
> +  * Since page->mapping is no more used for fsdax, we should
> +  * calculate the address in a fsdax way.
> +  */

/*
 * Since page->mapping is not used for fsdax, we need
 * calculate the address based on the vma.
 */

> +static void collect_procs_fsdax(struct page *page, struct address_space 
> *mapping,
> + pgoff_t pgoff, struct list_head *to_kill)

Overly long line here.



Re: [PATCH v8 5/9] fsdax: Introduce dax_lock_mapping_entry()

2021-12-14 Thread Christoph Hellwig
On Thu, Dec 02, 2021 at 04:48:52PM +0800, Shiyang Ruan wrote:
> The current dax_lock_page() locks dax entry by obtaining mapping and
> index in page.  To support 1-to-N RMAP in NVDIMM, we need a new function
> to lock a specific dax entry corresponding to this file's mapping,index.
> And BTW, output the page corresponding to the specific dax entry for
> caller use.

s/BTW, //g

>  /*
> - * dax_lock_mapping_entry - Lock the DAX entry corresponding to a page
> + * dax_lock_page - Lock the DAX entry corresponding to a page
>   * @page: The page whose entry we want to lock
>   *
>   * Context: Process context.

This should probably got into a separate trivial fix.

Otherwise looks good:

Reviewed-by: Christoph Hellwig 



Re: [PATCH v8 1/9] dax: Use percpu rwsem for dax_{read,write}_lock()

2021-12-14 Thread Christoph Hellwig
On Thu, Dec 02, 2021 at 04:48:48PM +0800, Shiyang Ruan wrote:
> In order to introduce dax holder registration, we need a write lock for
> dax.  Change the current lock to percpu_rw_semaphore and introduce a
> write lock for registration.

Why do we need to change the existing, global locking for that?

What is the impact of this to benchmarks?  Also if we stop using srcu
protection, we should be able to get rid of grace periods or RCU frees.



Re: [PATCH 0/4] add ro state control function for nvdimm drivers

2021-11-09 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 



Re: [PATCH 3/3] libnvdimm/pmem: Provide pmem_dax_clear_poison for dax operation

2021-11-04 Thread Christoph Hellwig
On Tue, Sep 14, 2021 at 05:31:32PM -0600, Jane Chu wrote:
> +static int pmem_dax_clear_poison(struct dax_device *dax_dev, pgoff_t pgoff,
> + size_t nr_pages)
> +{
> + unsigned int len = PFN_PHYS(nr_pages);
> + sector_t sector = PFN_PHYS(pgoff) >> SECTOR_SHIFT;
> + struct pmem_device *pmem = dax_get_private(dax_dev);
> + phys_addr_t pmem_off = sector * 512 + pmem->data_offset;
> + blk_status_t ret;
> +
> + if (!is_bad_pmem(>bb, sector, len))
> + return 0;
> +
> + ret = pmem_clear_poison(pmem, pmem_off, len);
> + return (ret == BLK_STS_OK) ? 0 : -EIO;

No need for the braces here (and I'd prefer a good old if anyway).

Otherwise looks good:

Reviewed-by: Christoph Hellwig 



Re: [PATCH 2/3] dax: introduce dax_clear_poison to dax pwrite operation

2021-11-04 Thread Christoph Hellwig
On Tue, Sep 14, 2021 at 05:31:30PM -0600, Jane Chu wrote:
> + if ((map_len == -EIO) && (iov_iter_rw(iter) == WRITE)) {

No need for the inner braces.

> + if (dax_clear_poison(dax_dev, pgoff, PHYS_PFN(size)) == 
> 0)

Overly long line.

Otherwise looks good, but it might need a rebase to the iomap_iter
changes.




Re: [PATCH 1/3] dax: introduce dax_operation dax_clear_poison

2021-11-04 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 



Re: [PATCH v10 7/8] xfs: support CoW in fsdax mode

2021-10-17 Thread Christoph Hellwig
On Thu, Oct 14, 2021 at 10:50:00AM -0700, Dan Williams wrote:
> The other blocker was enabling mounting dax filesystems on a
> dax-device rather than a block device. I'm actively refactoring the
> nvdimm subsystem side of that equation, but could use help with the
> conversion of the xfs mount path. Christoph, might you have that in
> your queue?

It's in my queue.  I'm about to send your a series of prep patches
and plan to tackle the actual mounting next merge window.



Re: [PATCH v7 7/8] xfs: Implement ->notify_failure() for XFS

2021-10-15 Thread Christoph Hellwig
On Fri, Sep 24, 2021 at 09:09:58PM +0800, Shiyang Ruan wrote:
> +void fs_dax_register_holder(struct dax_device *dax_dev, void *holder,
> + const struct dax_holder_operations *ops)
> +{
> + dax_set_holder(dax_dev, holder, ops);
> +}
> +EXPORT_SYMBOL_GPL(fs_dax_register_holder);
> +
> +void fs_dax_unregister_holder(struct dax_device *dax_dev)
> +{
> + dax_set_holder(dax_dev, NULL, NULL);
> +}
> +EXPORT_SYMBOL_GPL(fs_dax_unregister_holder);
> +
> +void *fs_dax_get_holder(struct dax_device *dax_dev)
> +{
> + return dax_get_holder(dax_dev);
> +}
> +EXPORT_SYMBOL_GPL(fs_dax_get_holder);

These should not be in a XFS patch.  But why do we even need this
wrappers?

> @@ -377,6 +385,8 @@ xfs_close_devices(
>  
>   xfs_free_buftarg(mp->m_logdev_targp);
>   xfs_blkdev_put(logdev);
> + if (dax_logdev)
> + fs_dax_unregister_holder(dax_logdev);
>   fs_put_dax(dax_logdev);

I'd prefer to include the fs_dax_unregister_holder in the fs_put_dax
call to avoid callers failing to unregister it.

> @@ -411,6 +425,9 @@ xfs_open_devices(
>   struct block_device *logdev = NULL, *rtdev = NULL;
>   int error;
>  
> + if (dax_ddev)
> + fs_dax_register_holder(dax_ddev, mp,
> + _dax_holder_operations);

I'd include the holder registration with fs_dax_get_by_bdev as well.



Re: [PATCH v7 8/8] fsdax: add exception for reflinked files

2021-10-15 Thread Christoph Hellwig
On Thu, Oct 14, 2021 at 12:24:50PM -0700, Darrick J. Wong wrote:
> It feels a little dangerous to have page->mapping for shared storage
> point to an actual address_space when there are really multiple
> potential address_spaces out there.  If the mm or dax folks are ok with
> doing this this way then I'll live with it, but it seems like you'd want
> to leave /some/ kind of marker once you know that the page has multiple
> owners and therefore regular mm rmap via page->mapping won't work.

Yes, I thing poisoning page->mapping for the rmap enabled case seems
like a better idea.



Re: [PATCH v7 4/8] pagemap,pmem: Introduce ->memory_failure()

2021-10-15 Thread Christoph Hellwig
Except for the error code inversion noticed by Darrick this looks fine
to me:

Reviewed-by: Christoph Hellwig 



Re: [PATCH v7 3/8] mm: factor helpers for memory_failure_dev_pagemap

2021-10-15 Thread Christoph Hellwig
On Fri, Sep 24, 2021 at 09:09:54PM +0800, Shiyang Ruan wrote:
> memory_failure_dev_pagemap code is a bit complex before introduce RMAP
> feature for fsdax.  So it is needed to factor some helper functions to
> simplify these code.
> 
> Signed-off-by: Shiyang Ruan 
> ---
>  mm/memory-failure.c | 140 
>  1 file changed, 76 insertions(+), 64 deletions(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 54879c339024..8ff9b52823c0 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1430,6 +1430,79 @@ static int try_to_split_thp_page(struct page *page, 
> const char *msg)
>   return 0;
>  }
>  
> +static void unmap_and_kill(struct list_head *to_kill, unsigned long pfn,
> + struct address_space *mapping, pgoff_t index, int flags)
> +{
> + struct to_kill *tk;
> + unsigned long size = 0;
> +
> + list_for_each_entry(tk, to_kill, nd)
> + if (tk->size_shift)
> + size = max(size, 1UL << tk->size_shift);
> + if (size) {

Nit: an empty line here would be nice for readability.

> + if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
> + /*
> +  * TODO: Handle HMM pages which may need coordination
> +  * with device-side memory.
> +  */
> + return -EBUSY;

We've got rid of the HMM terminology for device private memory, so
I'd reword this update the comment to follow that while you're at it.

Otherwise looks good:

Reviewed-by: Christoph Hellwig 



Re: [PATCH v7 1/8] dax: Use rwsem for dax_{read,write}_lock()

2021-10-15 Thread Christoph Hellwig
On Fri, Sep 24, 2021 at 09:09:52PM +0800, Shiyang Ruan wrote:
> In order to introduce dax holder registration, we need a write lock for
> dax.  Because of the rarity of notification failures and the infrequency
> of registration events, it would be better to be a global lock rather
> than per-device.  So, change the current lock to rwsem and introduce a
> write lock for registration.

I don't think taking the rw_semaphore everywhere will scale, as
basically any DAX based I/O will take it (in read mode).

So at a minimum we'd need a per-device percpu_rw_semaphore if we want
to go there.



Re: [PATCH v9 7/8] xfs: support CoW in fsdax mode

2021-09-21 Thread Christoph Hellwig
On Fri, Sep 17, 2021 at 08:33:04AM -0700, Darrick J. Wong wrote:
> > More importantly before we can merge this series we also need the VM
> > level support for reflink-aware reverse mapping.  So while this series
> > here is no in a good enough shape I don't see how we could merge it
> > without that other series as we'd have to disallow mmap for reflink+dax
> > files otherwise.
> 
> I've forgotten why we need mm level reverse mapping again?  The pmem
> poison stuff can use ->media_failure (or whatever it was called,
> memory_failure?) to find all the owners and notify them.  Was there
> some other accounting reason that fell out of my brain?
> 
> I'm more afraid of 'sharing pages between files needs mm support'
> sparking another multi-year folioesque fight with the mm people.

Because of the way page->mapping is used by DAX.  But I think this is
mostly under control in the other series.



Re: [PATCH 0/3] dax: clear poison on the fly along pwrite

2021-09-17 Thread Christoph Hellwig
On Thu, Sep 16, 2021 at 11:40:28AM -0700, Dan Williams wrote:
> > That was my gut feeling.  If everyone feels 100% comfortable with
> > zeroingas the mechanism to clear poisoning I'll cave in.  The most
> > important bit is that we do that through a dedicated DAX path instead
> > of abusing the block layer even more.
> 
> ...or just rename dax_zero_page_range() to dax_reset_page_range()?
> Where reset == "zero + clear-poison"?

I'd say that naming is more confusing than overloading zero.

> > I'm really worried about both patartitions on DAX and DM passing through
> > DAX because they deeply bind DAX to the block layer, which is just a bad
> > idea.  I think we also need to sort that whole story out before removing
> > the EXPERIMENTAL tags.
> 
> I do think it was a mistake to allow for DAX on partitions of a pmemX
> block-device.
> 
> DAX-reflink support may be the opportunity to start deprecating that
> support. Only enable DAX-reflink for direct mounting on /dev/pmemX
> without partitions (later add dax-device direct mounting),

I think we need to fully or almost fully sort this out.

Here is my bold suggestions:

 1) drop no drop the EXPERMINTAL on the current block layer overload
at all
 2) add direct mounting of the nvdimm namespaces ASAP.  Because all
the filesystem currently also need the /dev/pmem0 device add a way
to open the block device by the dax_device instead of our current
way of doing the reverse
 3) deprecate DAX support through block layer mounts with a say 2 year
deprecation period
 4) add DAX remapping devices as needed

I'll volunteer to write the initial code for 2).  And I think we should
not allow DAX+reflink on the block device shim at all.



Re: [PATCH 0/3] dax: clear poison on the fly along pwrite

2021-09-16 Thread Christoph Hellwig
On Wed, Sep 15, 2021 at 01:27:47PM -0700, Dan Williams wrote:
> > Yeah, Christoph suggested that we make the clearing operation explicit
> > in a related thread a few weeks ago:
> > https://lore.kernel.org/linux-fsdevel/yrtnlperhfmz2...@infradead.org/
> 
> That seemed to be tied to a proposal to plumb it all the way out to an
> explicit fallocate() mode, not make it a silent side effect of
> pwrite().

Yes.

> >
> > Each of the dm drivers has to add their own ->clear_poison operation
> > that remaps the incoming (sector, len) parameters as appropriate for
> > that device and then calls the lower device's ->clear_poison with the
> > translated parameters.
> >
> > This (AFAICT) has already been done for dax_zero_page_range, so I sense
> > that Dan is trying to save you a bunch of code plumbing work by nudging
> > you towards doing s/dax_clear_poison/dax_zero_page_range/ to this series
> > and then you only need patches 2-3.
> 
> Yes, but it sounds like Christoph was saying don't overload
> dax_zero_page_range(). I'd be ok splitting the difference and having a
> new fallocate clear poison mode map to dax_zero_page_range()
> internally.

That was my gut feeling.  If everyone feels 100% comfortable with
zeroingas the mechanism to clear poisoning I'll cave in.  The most
important bit is that we do that through a dedicated DAX path instead
of abusing the block layer even more.

> 
> >
> > > BTW, our customer doesn't care about creating dax volume thru DM, so.
> >
> > They might not care, but anything going upstream should work in the
> > general case.
> 
> Agree.

I'm really worried about both patartitions on DAX and DM passing through
DAX because they deeply bind DAX to the block layer, which is just a bad
idea.  I think we also need to sort that whole story out before removing
the EXPERIMENTAL tags.



Re: [PATCH v9 7/8] xfs: support CoW in fsdax mode

2021-09-16 Thread Christoph Hellwig
On Wed, Sep 15, 2021 at 05:22:27PM -0700, Darrick J. Wong wrote:
> > xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
> > ret = dax_iomap_fault(vmf, pe_size, , NULL,
> > (write_fault && !vmf->cow_page) ?
> > -_direct_write_iomap_ops :
> > -_read_iomap_ops);
> > +   _dax_write_iomap_ops :
> > +   _read_iomap_ops);
> 
> Hmm... I wonder if this should get hoisted to a "xfs_dax_iomap_fault"
> wrapper like you did for xfs_iomap_zero_range?

This has just a single users, so the classic argument won't apply.  That
being said __xfs_filemap_fault is a complete mess to due the calling
conventions of the various VFS methods multiplexed into it.  So yes,
splitting out a xfs_dax_iomap_fault to wrap the above plus the
dax_finish_sync_fault call might not actually be a bad idea nevertheless.

> > +   struct xfs_inode*ip = XFS_I(inode);
> > +   /*
> > +* Usually we use @written to indicate whether the operation was
> > +* successful.  But it is always positive or zero.  The CoW needs the
> > +* actual error code from actor().  So, get it from
> > +* iomap_iter->processed.
> 
> Hm.  All six arguments are derived from the struct iomap_iter, so maybe
> it makes more sense to pass that in?  I'll poke around with this more
> tomorrow.

I'd argue against just changing the calling conventions for ->iomap_end
now.  The original iter patches from willy allowed passing a single
next callback combinging iomap_begin and iomap_end in a way that with
a little magic we can avoid the indirect calls entirely.  I think we'll
need to experiment with that that a bit and see if is worth the effort
first.  I plan to do that but I might not get to it immediate.  If some
else wants to take over I'm fine with that.

> >  static int
> >  xfs_buffered_write_iomap_begin(
> 
> Also, we have an related request to drop the EXPERIMENTAL tag for
> non-DAX reflink.  Whichever patch enables dax+reflink for xfs needs to
> make it clear that reflink + any possibility of DAX emits an
> EXPERIMENTAL warning.

More importantly before we can merge this series we also need the VM
level support for reflink-aware reverse mapping.  So while this series
here is no in a good enough shape I don't see how we could merge it
without that other series as we'd have to disallow mmap for reflink+dax
files otherwise.



Re: [PATCH v9 7/8] xfs: support CoW in fsdax mode

2021-09-16 Thread Christoph Hellwig
On Wed, Sep 15, 2021 at 06:45:00PM +0800, Shiyang Ruan wrote:
> +static int
> +xfs_dax_write_iomap_end(
> + struct inode*inode,
> + loff_t  pos,
> + loff_t  length,
> + ssize_t written,
> + unsignedflags,
> + struct iomap*iomap)
> +{
> + struct xfs_inode*ip = XFS_I(inode);
> + /*
> +  * Usually we use @written to indicate whether the operation was
> +  * successful.  But it is always positive or zero.  The CoW needs the
> +  * actual error code from actor().  So, get it from
> +  * iomap_iter->processed.
> +  */
> + const struct iomap_iter *iter =
> + container_of(iomap, typeof(*iter), iomap);
> +
> + if (!xfs_is_cow_inode(ip))
> + return 0;
> +
> + if (iter->processed <= 0) {
> + xfs_reflink_cancel_cow_range(ip, pos, length, true);
> + return 0;
> + }
> +
> + return xfs_reflink_end_cow(ip, pos, iter->processed);

Didn't we come to the conflusion last time that we don't actually
need to poke into the iomap_iter here as the written argument is equal
to iter->processed if it is > 0:

if (iter->iomap.length && ops->iomap_end) {
ret = ops->iomap_end(iter->inode, iter->pos, iomap_length(iter),
iter->processed > 0 ? iter->processed : 0,
iter->flags, >iomap);
..

So should be able to just do:

static int
xfs_dax_write_iomap_end(
struct inode*inode,
loff_t  pos,
loff_t  length,
ssize_t written,
unsignedflags,
struct iomap*iomap)
{
struct xfs_inode*ip = XFS_I(inode);

if (!xfs_is_cow_inode(ip))
return 0;

if (!written) {
xfs_reflink_cancel_cow_range(ip, pos, length, true);
return 0;
}

return xfs_reflink_end_cow(ip, pos, written);
}



Re: [PATCH v9 4/8] fsdax: Convert dax_iomap_zero to iter model

2021-09-16 Thread Christoph Hellwig
> -s64 dax_iomap_zero(loff_t pos, u64 length, struct iomap *iomap)
> +s64 dax_iomap_zero(struct iomap_iter *iter, loff_t pos, u64 length)

I think we can also mark the iter const.

Otherwise looks good:

Reviewed-by: Christoph Hellwig 



Re: [PATCH v9 6/8] fsdax: Dedup file range to use a compare function

2021-09-16 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 



Re: [PATCH v9 5/8] fsdax: Add dax_iomap_cow_copy() for dax_iomap_zero

2021-09-16 Thread Christoph Hellwig
On Wed, Sep 15, 2021 at 06:44:58PM +0800, Shiyang Ruan wrote:
> + rc = dax_direct_access(iomap->dax_dev, pgoff, 1, , NULL);
> + if (rc < 0)
> + goto out;
> + memset(kaddr + offset, 0, size);
> + if (srcmap->addr != IOMAP_HOLE && srcmap->addr != iomap->addr) {

Should we also check that ->dax_dev for iomap and srcmap are different
first to deal with case of file system with multiple devices?

Otherwise looks good:

Reviewed-by: Christoph Hellwig 



Re: [PATCH v8 7/7] xfs: Add dax dedupe support

2021-09-02 Thread Christoph Hellwig
On Sun, Aug 29, 2021 at 08:25:17PM +0800, Shiyang Ruan wrote:
> Introduce xfs_mmaplock_two_inodes_and_break_dax_layout() for dax files
> who are going to be deduped.  After that, call compare range function
> only when files are both DAX or not.
> 
> Signed-off-by: Shiyang Ruan 
> Reviewed-by: Darrick J. Wong 

Looks good,

Reviewed-by: Christoph Hellwig 



Re: [PATCH v8 6/7] xfs: support CoW in fsdax mode

2021-09-02 Thread Christoph Hellwig
On Sun, Aug 29, 2021 at 08:25:16PM +0800, Shiyang Ruan wrote:
> In fsdax mode, WRITE and ZERO on a shared extent need CoW performed.
> After that, new allocated extents needs to be remapped to the file.  Add
> an implementation of ->iomap_end() for dax write ops to do the remapping
> work.

Please split the new dax infrastructure from the XFS changes.

>  static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
> -int *iomap_errp, const struct iomap_ops *ops)
> + int *iomap_errp, const struct iomap_ops *ops)
>  {
>   struct address_space *mapping = vmf->vma->vm_file->f_mapping;
>   XA_STATE(xas, >i_pages, vmf->pgoff);
> @@ -1631,7 +1664,7 @@ static bool dax_fault_check_fallback(struct vm_fault 
> *vmf, struct xa_state *xas,
>  }
>  
>  static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
> -const struct iomap_ops *ops)
> + const struct iomap_ops *ops)

These looks like unrelated whitespace changes.

> -static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
> +loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
>  {
>   const struct iomap *iomap = >iomap;
>   const struct iomap *srcmap = iomap_iter_srcmap(iter);
> @@ -918,6 +918,7 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, 
> bool *did_zero)
>  
>   return written;
>  }
> +EXPORT_SYMBOL_GPL(iomap_zero_iter);

I don't see why this would have to be exported.

> + unsignedflags,
> + struct iomap*iomap)
> +{
> + int error = 0;
> + struct xfs_inode*ip = XFS_I(inode);
> + boolcow = xfs_is_cow_inode(ip);

The cow variable is only used once, so I think we can drop it.

> + const struct iomap_iter *iter =
> + container_of(iomap, typeof(*iter), iomap);

Please comment this as it is a little unusual.

> +
> + if (cow) {
> + if (iter->processed <= 0)
> + xfs_reflink_cancel_cow_range(ip, pos, length, true);
> + else
> + error = xfs_reflink_end_cow(ip, pos, iter->processed);
> + }
> + return error ?: iter->processed;

The ->iomap_end convention is to return 0 or a negative error code.
Also i'd much prefer to just spell this out in a normal sequential way:

if (!xfs_is_cow_inode(ip))
return 0;

if (iter->processed <= 0) {
xfs_reflink_cancel_cow_range(ip, pos, length, true);
return 0;
}

return xfs_reflink_end_cow(ip, pos, iter->processed);

> +static inline int
> +xfs_iomap_zero_range(
> + struct xfs_inode*ip,
> + loff_t  pos,
> + loff_t  len,
> + bool*did_zero)
> +{
> + struct inode*inode = VFS_I(ip);
> +
> + return IS_DAX(inode)
> + ? dax_iomap_zero_range(inode, pos, len, did_zero,
> +_dax_write_iomap_ops)
> + : iomap_zero_range(inode, pos, len, did_zero,
> +_buffered_write_iomap_ops);
> +}

if (IS_DAX(inode))
return dax_iomap_zero_range(inode, pos, len, did_zero,
_dax_write_iomap_ops);
return iomap_zero_range(inode, pos, len, did_zero,
_buffered_write_iomap_ops);

> +static inline int
> +xfs_iomap_truncate_page(
> + struct xfs_inode*ip,
> + loff_t  pos,
> + bool*did_zero)
> +{
> + struct inode*inode = VFS_I(ip);
> +
> + return IS_DAX(inode)
> + ? dax_iomap_truncate_page(inode, pos, did_zero,
> +_dax_write_iomap_ops)
> + : iomap_truncate_page(inode, pos, did_zero,
> +_buffered_write_iomap_ops);
> +}

Same here.



Re: [PATCH v8 5/7] fsdax: Dedup file range to use a compare function

2021-09-02 Thread Christoph Hellwig
> +EXPORT_SYMBOL(vfs_dedupe_file_range_compare);

I don't see why this would need to be exported.

> @@ -370,6 +384,15 @@ int generic_remap_file_range_prep(struct file *file_in, 
> loff_t pos_in,
>  
>   return ret;
>  }
> +EXPORT_SYMBOL(__generic_remap_file_range_prep);

Same here.



  1   2   3   4   5   6   7   8   9   10   >