Re: [RFC, PATCH, RESEND] fs: push rcu_barrier() from deactivate_locked_super() to filesystems
On Sat, Jun 09, 2012 at 12:25:57AM -0700, Andrew Morton wrote: And... it seems that I misread what's going on. The individual filesystems are doing the rcu freeing of their inodes, so it is appropriate that they also call rcu_barrier() prior to running kmem_cache_free(). Which is what Kirill's patch does. oops. Ack? ;) -- Kirill A. Shutemov signature.asc Description: Digital signature
Re: [RFC, PATCH, RESEND] fs: push rcu_barrier() from deactivate_locked_super() to filesystems
Il 09/06/2012 02:28, Andrew Morton ha scritto: On Fri, 8 Jun 2012 16:46:47 -0700 Linus Torvaldstorva...@linux-foundation.org wrote: Of course, if you just mean having a VFS wrapper that does static void vfs_inode_kmem_cache_destroy(struct kmem_cache *cachep) { rcu_barrier(); kmem_cache_destroy(cachep); } then we could do that. Not much better than what Kirill's patch did, but at least we could have that comment in just one single place. That's conceptually what I meant. But it has the problem that new and out-of-tree filesystems might forget to do it. Which is why I suggest adding a kmem_cache* argument to unregister_filesystem() for this. It's a bit awkward, and the fs can pass in NULL if it knows what it's doing. But it's reliable. -- The call of rcu_barrier should be mandatory for the unload fs module problem, right? If the fs is compiled statically maybe we could avoid it, but (eventually) this kind of decision is per-fs, so this could be a clue that the call of rcu_barrier maybe is inside each fs not in VFS. Marco -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC, PATCH, RESEND] fs: push rcu_barrier() from deactivate_locked_super() to filesystems
On Sat, 09 Jun 2012 09:06:28 +0200 Marco Stornelli marco.storne...@gmail.com wrote: Il 09/06/2012 02:28, Andrew Morton ha scritto: On Fri, 8 Jun 2012 16:46:47 -0700 Linus Torvaldstorva...@linux-foundation.org wrote: Of course, if you just mean having a VFS wrapper that does static void vfs_inode_kmem_cache_destroy(struct kmem_cache *cachep) { rcu_barrier(); kmem_cache_destroy(cachep); } then we could do that. Not much better than what Kirill's patch did, but at least we could have that comment in just one single place. That's conceptually what I meant. But it has the problem that new and out-of-tree filesystems might forget to do it. Which is why I suggest adding a kmem_cache* argument to unregister_filesystem() for this. It's a bit awkward, and the fs can pass in NULL if it knows what it's doing. But it's reliable. -- The call of rcu_barrier should be mandatory for the unload fs module problem, right? If the fs is compiled statically maybe we could avoid it, but (eventually) this kind of decision is per-fs, so this could be a clue that the call of rcu_barrier maybe is inside each fs not in VFS. No, this is unrelated to module unloading and the problem affects statically linked filesystems also. The requirement is that all inodes which are pending rcu freeing be flushed (and freed) before their cache is destroyed in kmem_cache_destroy(). And... it seems that I misread what's going on. The individual filesystems are doing the rcu freeing of their inodes, so it is appropriate that they also call rcu_barrier() prior to running kmem_cache_free(). Which is what Kirill's patch does. oops. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC, PATCH, RESEND] fs: push rcu_barrier() from deactivate_locked_super() to filesystems
On Sat, 9 Jun 2012 00:41:03 +0300 Kirill A. Shutemov kirill.shute...@linux.intel.com wrote: There's no reason to call rcu_barrier() on every deactivate_locked_super(). We only need to make sure that all delayed rcu free inodes are flushed before we destroy related cache. Removing rcu_barrier() from deactivate_locked_super() affects some fas paths. E.g. on my machine exit_group() of a last process in IPC namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time. What an unpleasant patch. Is final-process-exiting-ipc-namespace a sufficiently high-frequency operation to justify the change? I don't really understand what's going on here. Are you saying that there is some filesystem against which we run deactivate_locked_super() during exit_group(), and that this filesystem doesn't use rcu-freeing of inodes? The description needs this level of detail, please. The implementation would be less unpleasant if we could do the rcu_barrier() in kmem_cache_destroy(). I can't see a way of doing that without adding a dedicated slab flag, which would require editing all the filesystems anyway. (kmem_cache_destroy() already has an rcu_barrier(). Can we do away with the private rcu games in the vfs and switch to SLAB_DESTROY_BY_RCU?) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC, PATCH, RESEND] fs: push rcu_barrier() from deactivate_locked_super() to filesystems
On Fri, Jun 08, 2012 at 03:02:53PM -0700, Andrew Morton wrote: On Sat, 9 Jun 2012 00:41:03 +0300 Kirill A. Shutemov kirill.shute...@linux.intel.com wrote: There's no reason to call rcu_barrier() on every deactivate_locked_super(). We only need to make sure that all delayed rcu free inodes are flushed before we destroy related cache. Removing rcu_barrier() from deactivate_locked_super() affects some fas paths. E.g. on my machine exit_group() of a last process in IPC namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time. What an unpleasant patch. Is final-process-exiting-ipc-namespace a sufficiently high-frequency operation to justify the change? I don't really understand what's going on here. Are you saying that there is some filesystem against which we run deactivate_locked_super() during exit_group(), and that this filesystem doesn't use rcu-freeing of inodes? The description needs this level of detail, please. I think the rcu_barrier() is in wrong place. We need it to safely destroy inode cache. deactivate_locked_super() is part of umount() path, but all filesystems I've checked have inode cache for whole filesystem, not per-mount. The implementation would be less unpleasant if we could do the rcu_barrier() in kmem_cache_destroy(). I can't see a way of doing that without adding a dedicated slab flag, which would require editing all the filesystems anyway. I think rcu_barrier() for all kmem_cache_destroy() would be too expensive. (kmem_cache_destroy() already has an rcu_barrier(). Can we do away with the private rcu games in the vfs and switch to SLAB_DESTROY_BY_RCU?) -- Kirill A. Shutemov signature.asc Description: Digital signature
Re: [RFC, PATCH, RESEND] fs: push rcu_barrier() from deactivate_locked_super() to filesystems
On Sat, Jun 09, 2012 at 01:14:46AM +0300, Kirill A. Shutemov wrote: The implementation would be less unpleasant if we could do the rcu_barrier() in kmem_cache_destroy(). I can't see a way of doing that without adding a dedicated slab flag, which would require editing all the filesystems anyway. I think rcu_barrier() for all kmem_cache_destroy() would be too expensive. You've got to be kidding. Please, show us the codepath that would be hot enough to make that too expensive and would contain kmem_cache_destroy(). Note that module unload is *not* a hot path - not on any even remotely sane use. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC, PATCH, RESEND] fs: push rcu_barrier() from deactivate_locked_super() to filesystems
On Sat, 9 Jun 2012 01:14:46 +0300 Kirill A. Shutemov kirill.shute...@linux.intel.com wrote: On Fri, Jun 08, 2012 at 03:02:53PM -0700, Andrew Morton wrote: On Sat, 9 Jun 2012 00:41:03 +0300 Kirill A. Shutemov kirill.shute...@linux.intel.com wrote: There's no reason to call rcu_barrier() on every deactivate_locked_super(). We only need to make sure that all delayed rcu free inodes are flushed before we destroy related cache. Removing rcu_barrier() from deactivate_locked_super() affects some fas paths. E.g. on my machine exit_group() of a last process in IPC namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time. What an unpleasant patch. Is final-process-exiting-ipc-namespace a sufficiently high-frequency operation to justify the change? This, please. I don't really understand what's going on here. Are you saying that there is some filesystem against which we run deactivate_locked_super() during exit_group(), and that this filesystem doesn't use rcu-freeing of inodes? The description needs this level of detail, please. You still haven't explained where this deactivate_locked_super() call is coming from. Oh well. I think the rcu_barrier() is in wrong place. We need it to safely destroy inode cache. deactivate_locked_super() is part of umount() path, but all filesystems I've checked have inode cache for whole filesystem, not per-mount. Well from a design perspective, putting the rcu_barrier() in the vfs is the *correct* place. Individual filesystems shouldn't be hard-coding knowledge about vfs internal machinery. A neater implementation might be to add a kmem_cache* argument to unregister_filesystem(). If that is non-NULL, unregister_filesystem() does the rcu_barrier() and destroys the cache. That way we get to delete (rather than add) a bunch of code from all filesystems and new and out-of-tree filesystems cannot forget to perform the rcu_barrier(). The implementation would be less unpleasant if we could do the rcu_barrier() in kmem_cache_destroy(). I can't see a way of doing that without adding a dedicated slab flag, which would require editing all the filesystems anyway. I think rcu_barrier() for all kmem_cache_destroy() would be too expensive. That is not what I proposed. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC, PATCH, RESEND] fs: push rcu_barrier() from deactivate_locked_super() to filesystems
On Fri, Jun 08, 2012 at 03:25:50PM -0700, Andrew Morton wrote: A neater implementation might be to add a kmem_cache* argument to unregister_filesystem(). If that is non-NULL, unregister_filesystem() does the rcu_barrier() and destroys the cache. That way we get to delete (rather than add) a bunch of code from all filesystems and new and out-of-tree filesystems cannot forget to perform the rcu_barrier(). There's often enough more than one cache, so that one is no-go. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC, PATCH, RESEND] fs: push rcu_barrier() from deactivate_locked_super() to filesystems
On Fri, Jun 8, 2012 at 3:23 PM, Al Viro v...@zeniv.linux.org.uk wrote: Note that module unload is *not* a hot path - not on any even remotely sane use. Actually, I think we've had distributions that basically did a load pretty much everything, and let God sort it out approach to modules. I know some people *have* actually worried about module load/unload performance. Whether it is remotely sane I'm not going to argue for, but .. Linus -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC, PATCH, RESEND] fs: push rcu_barrier() from deactivate_locked_super() to filesystems
On Fri, Jun 08, 2012 at 03:25:50PM -0700, Andrew Morton wrote: On Sat, 9 Jun 2012 01:14:46 +0300 Kirill A. Shutemov kirill.shute...@linux.intel.com wrote: On Fri, Jun 08, 2012 at 03:02:53PM -0700, Andrew Morton wrote: On Sat, 9 Jun 2012 00:41:03 +0300 Kirill A. Shutemov kirill.shute...@linux.intel.com wrote: There's no reason to call rcu_barrier() on every deactivate_locked_super(). We only need to make sure that all delayed rcu free inodes are flushed before we destroy related cache. Removing rcu_barrier() from deactivate_locked_super() affects some fas paths. E.g. on my machine exit_group() of a last process in IPC namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time. What an unpleasant patch. Is final-process-exiting-ipc-namespace a sufficiently high-frequency operation to justify the change? This, please. ALT Linux guys use a namespaces (including IPC namespace) to create sandbox[1] for build system and other use cases. The build system calls the sandboxing wrapper frequently on setup building chroot and build prepare. This kind of delays affect timings significantly. [1] http://git.altlinux.org/people/ldv/packages/hasher-priv.git I don't really understand what's going on here. Are you saying that there is some filesystem against which we run deactivate_locked_super() during exit_group(), and that this filesystem doesn't use rcu-freeing of inodes? The description needs this level of detail, please. You still haven't explained where this deactivate_locked_super() call is coming from. Oh well. Call Trace: [81443a2a] schedule+0x3a/0x50 [81441e7d] schedule_timeout+0x1cd/0x2c0 [811f8f87] ? mqueue_destroy_inode+0x17/0x20 [81443044] wait_for_common+0xc4/0x160 [8107af50] ? try_to_wake_up+0x2a0/0x2a0 [810d63b0] ? call_rcu_sched+0x10/0x20 [810d63a0] ? call_rcu_bh+0x20/0x20 [81443188] wait_for_completion+0x18/0x20 [810d5a9b] _rcu_barrier.clone.31+0x9b/0xb0 [810d5ac0] rcu_barrier_sched+0x10/0x20 [810d5ad9] rcu_barrier+0x9/0x10 [811602c9] deactivate_locked_super+0x49/0x90 [81160d35] deactivate_super+0x45/0x60 [8117ad74] mntput_no_expire+0x104/0x150 [8117addc] mntput+0x1c/0x30 [8117cda7] kern_unmount+0x27/0x30 [811faeb0] mq_put_mnt+0x10/0x20 [811fb57f] put_ipc_ns+0x3f/0xb0 [81071f5c] free_nsproxy+0x3c/0xa0 [81072143] switch_task_namespaces+0x33/0x40 [8107215b] exit_task_namespaces+0xb/0x10 [8104f154] do_exit+0x4b4/0x8a0 [8104f7e3] do_group_exit+0x53/0xc0 [8104f862] sys_exit_group+0x12/0x20 [8144c939] system_call_fastpath+0x16/0x1b -- Kirill A. Shutemov signature.asc Description: Digital signature
Re: [RFC, PATCH, RESEND] fs: push rcu_barrier() from deactivate_locked_super() to filesystems
On Sat, 9 Jun 2012 02:31:27 +0300 Kirill A. Shutemov kirill.shute...@linux.intel.com wrote: On Fri, Jun 08, 2012 at 03:31:20PM -0700, Andrew Morton wrote: On Fri, 8 Jun 2012 23:27:34 +0100 Al Viro v...@zeniv.linux.org.uk wrote: On Fri, Jun 08, 2012 at 03:25:50PM -0700, Andrew Morton wrote: A neater implementation might be to add a kmem_cache* argument to unregister_filesystem(). If that is non-NULL, unregister_filesystem() does the rcu_barrier() and destroys the cache. That way we get to delete (rather than add) a bunch of code from all filesystems and new and out-of-tree filesystems cannot forget to perform the rcu_barrier(). There's often enough more than one cache, so that one is no-go. kmem_cache** ;) Which filesystems have multiple inode caches? Multiple inode caches? No. Multiple caches with call_rcu() free? See btrfs or gfs2. OK. But for those non-inode caches, the rcu treatment is private to the filesystem. Hence it is appropriate that the filesystem call rcu_barrier() for those caches. But in the case of the inode caches, the rcu treatment is a vfs thing, so it is the vfs which should perform the rcu_barrier(). This is a red herring - those non-inode caches have nothing to do with the issue we're dicussing. So how about open-coding the rcu_barrier() in btrfs and gfs2 for the non-inode caches (which is the appropriate place), and hand the inode cache over to the vfs for treatment (which is the appropriate place). The downside is that btrfs and gfs2 will do an extra rcu_barrier() at umount time. Shrug. If they really want to super-optimise that, they can skip the private rcu_barrier() call and assume that the vfs will be doing it. Not a good idea, IMO. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC, PATCH, RESEND] fs: push rcu_barrier() from deactivate_locked_super() to filesystems
On Fri, Jun 8, 2012 at 4:37 PM, Andrew Morton a...@linux-foundation.org wrote: So how about open-coding the rcu_barrier() in btrfs and gfs2 for the non-inode caches (which is the appropriate place), and hand the inode cache over to the vfs for treatment (which is the appropriate place). The thing is, none of the inode caches are really up to the VFS. They are per-filesystem caches, that just *embed* an inode as part of the bigger ext4_inode or whatever. But apart from the fact that the embedded inode requires them to then use the proper call_rcu() stuff to do the delayed free, they really are pretty much filesystem data structures. The VFS layer can neither free them or allocate them, since the VFS layer doesn't even know how big the structures are, or where the inodes are embedded, or how to initialize them (or even when to allocate them). Of course, if you just mean having a VFS wrapper that does static void vfs_inode_kmem_cache_destroy(struct kmem_cache *cachep) { rcu_barrier(); kmem_cache_destroy(cachep); } then we could do that. Not much better than what Kirill's patch did, but at least we could have that comment in just one single place. Linus -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC, PATCH, RESEND] fs: push rcu_barrier() from deactivate_locked_super() to filesystems
On Fri, 8 Jun 2012 16:46:47 -0700 Linus Torvalds torva...@linux-foundation.org wrote: Of course, if you just mean having a VFS wrapper that does static void vfs_inode_kmem_cache_destroy(struct kmem_cache *cachep) { rcu_barrier(); kmem_cache_destroy(cachep); } then we could do that. Not much better than what Kirill's patch did, but at least we could have that comment in just one single place. That's conceptually what I meant. But it has the problem that new and out-of-tree filesystems might forget to do it. Which is why I suggest adding a kmem_cache* argument to unregister_filesystem() for this. It's a bit awkward, and the fs can pass in NULL if it knows what it's doing. But it's reliable. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html