Hi Bob thanks for your help, me and Edvin have spent some time looking at the git log for fs/gfs2 and we came up with two lists for potential backports to stable 4.4.y
LIST 1: Critical bugs in our opinion. I have tried to cherry-pick some of them on v4.4.103 but there are conflicts and I am not sure I would know how to resolve them in a safe way LIST 2: Would be good to have but not sure they are all important (LIST 2 contains also LIST 1) It would be very good if you could have a look at LIST 1 and LIST 2 as well and let us know what do you think. In case you agree that LIST 1 (or part of it) is critical, would you be able to provide the backports for that? If you do not have time we could try to backport it ourself, would you be OK to review our backports? Thanks a lot, Stefano LIST 1 ---------------------------------------------------------------------------------------------------------------------- commit cc1dfa8b7571ea16dec9a29e0f4c4cad90b2a761 Author: Thomas Tai <[email protected]> Date: Tue Aug 15 11:54:09 2017 -0500 gfs2: fix slab corruption during mounting and umounting gfs file system When using cman-3.0.12.1 and gfs2-utils-3.0.12.1, mounting and unmounting GFS2 file system would cause kernel to hang. The slab allocator suggests that it is likely a double free memory corruption. The issue is traced back to v3.9-rc6 where a patch is submitted to use kzalloc() for storing a bitmap instead of using a local variable. The intention is to allocate memory during mount and to free memory during unmount. The original patch misses a code path which has already freed the memory and caused memory corruption. This patch sets the memory pointer to NULL after the memory is freed, so that double free memory corruption will not happen. gdlm_mount() '-- set_recover_size() which use kzalloc() '-- if dlm does not support ops callbacks then '--- free_recover_size() which use kfree() gldm_unmount() '-- free_recover_size() which use kfree() Previous patch which introduced the double free issue is commit 57c7310b8eb9 ("GFS2: use kmalloc for lvb bitmap") Signed-off-by: Thomas Tai <[email protected]> Signed-off-by: Bob Peterson <[email protected]> Reviewed-by: Liam R. Howlett <[email protected]> commit b066a4eebd4f5ea77f7e5c7d13104d38e1a1d4bf Author: Abhi Das <[email protected]> Date: Fri Aug 4 12:15:32 2017 -0500 gfs2: forcibly flush ail to relieve memory pressure On systems with low memory, it is possible for gfs2 to infinitely loop in balance_dirty_pages() under heavy IO (creating sparse files). balance_dirty_pages() attempts to write out the dirty pages via gfs2_writepages() but none are found because these dirty pages are being used by the journaling code in the ail. Normally, the journal has an upper threshold which when hit triggers an automatic flush of the ail. But this threshold can be higher than the number of allowable dirty pages and result in the ail never being flushed. This patch forces an ail flush when gfs2_writepages() fails to write anything. This is a good indication that the ail might be holding some dirty pages. Signed-off-by: Abhi Das <[email protected]> Signed-off-by: Bob Peterson <[email protected]> commit a91323e255fa8bc84b0acf63376b395c534a38fa Author: Andreas Gruenbacher <[email protected]> Date: Fri Aug 4 07:40:45 2017 -0500 gfs2: Clean up waiting on glocks The prepare_to_wait_on_glock and finish_wait_on_glock functions introduced in commit 56a365be "gfs2: gfs2_glock_get: Wait on freeing glocks" are better removed, resulting in cleaner code. Signed-off-by: Andreas Gruenbacher <[email protected]> Signed-off-by: Bob Peterson <[email protected]> commit 6a1c8f6dcf815d96197a2723781cf700925d17ed Author: Andreas Gruenbacher <[email protected]> Date: Tue Aug 1 11:49:42 2017 -0500 gfs2: Defer deleting inodes under memory pressure When under memory pressure and an inode's link count has dropped to zero, defer deleting the inode to the delete workqueue. This avoids calling into DLM under memory pressure, which can deadlock. Signed-off-by: Andreas Gruenbacher <[email protected]> Signed-off-by: Bob Peterson <[email protected]> commit 71c1b2136835c88c231f7a5e3dc618f7568f84f7 Author: Andreas Gruenbacher <[email protected]> Date: Tue Aug 1 11:45:23 2017 -0500 gfs2: gfs2_evict_inode: Put glocks asynchronously gfs2_evict_inode is called to free inodes under memory pressure. The function calls into DLM when an inode's last cluster-wide reference goes away (remote unlink) and to release the glock and associated DLM lock before finally destroying the inode. However, if DLM is blocked on memory to become available, calling into DLM again will deadlock. Avoid that by decoupling releasing glocks from destroying inodes in that case: with gfs2_glock_queue_put, glocks will be dequeued asynchronously in work queue context, when the associated inodes have likely already been destroyed. commit 2df6f47150b6afbb258ed1d5c9ed78c23df05053 Author: Bob Peterson <[email protected]> Date: Wed Jan 27 16:00:38 2016 -0500 GFS2: Fix direct IO write rounding error The fsx test in xfstests was failing because it was using direct IO writes which were using a bad calculation. It was using loff_t lstart = offset & (PAGE_CACHE_SIZE - 1); when it should be loff_t lstart = offset & ~(PAGE_CACHE_SIZE - 1); Thus, the write at offset 0x67e00 was calculating lstart to be 0xe00, the address of our corruption. Instead, it should have been 0x67000. This patch fixes the calculation. Signed-off-by: Bob Peterson <[email protected]> Acked-by: Steven Whitehouse <[email protected]> LIST 2 -------------------------------------------------------------------------------- commit cc1dfa8b7571ea16dec9a29e0f4c4cad90b2a761 Author: Thomas Tai <[email protected]> Date: Tue Aug 15 11:54:09 2017 -0500 gfs2: fix slab corruption during mounting and umounting gfs file system commit b066a4eebd4f5ea77f7e5c7d13104d38e1a1d4bf Author: Abhi Das <[email protected]> Date: Fri Aug 4 12:15:32 2017 -0500 gfs2: forcibly flush ail to relieve memory pressure commit a91323e255fa8bc84b0acf63376b395c534a38fa Author: Andreas Gruenbacher <[email protected]> Date: Fri Aug 4 07:40:45 2017 -0500 gfs2: Clean up waiting on glocks commit 6a1c8f6dcf815d96197a2723781cf700925d17ed Author: Andreas Gruenbacher <[email protected]> Date: Tue Aug 1 11:49:42 2017 -0500 gfs2: Defer deleting inodes under memory pressure commit 71c1b2136835c88c231f7a5e3dc618f7568f84f7 Author: Andreas Gruenbacher <[email protected]> Date: Tue Aug 1 11:45:23 2017 -0500 gfs2: gfs2_evict_inode: Put glocks asynchronously commit 4d7c18c7df89ef549f2de79b0faf873b49dea57a Author: Bob Peterson <[email protected]> Date: Tue Jul 18 12:15:01 2017 -0500 GFS2: Set gl_object in inode lookup only after block type check commit d4d7fc12b642a16732adeacefdaebe684bcb2218 Author: Andrew Price <[email protected]> Date: Wed Apr 5 11:45:26 2017 -0400 gfs2: Re-enable fallocate for the rindex commit cc963a11b67b796c25c5b827b25d2bcc92ce1779 Author: Bob Peterson <[email protected]> Date: Thu Mar 16 15:29:13 2017 -0400 GFS2: Temporarily zero i_no_addr when creating a dinode commit 2fcf5cc3be06126f9aa2430ca6d739c8b3c5aaf5 Author: Bob Peterson <[email protected]> Date: Fri Dec 16 08:01:28 2016 -0600 GFS2: Limit number of transaction blocks requested for truncates commit 14d37564fa3dc4e5d4c6828afcd26ac14e6796c5 Author: Dan Carpenter <[email protected]> Date: Wed Dec 14 08:02:03 2016 -0600 GFS2: Fix reference to ERR_PTR in gfs2_glock_iter_next commit 3ce37b2cb4917674fa5b776e857dcea94c0e0835 Author: Andreas Gruenbacher <[email protected]> Date: Tue Jun 14 12:22:27 2016 -0500 gfs2: Fix gfs2_lookup_by_inum lock inversion commit 1e875f5a95a28b5286165db9fa832b0773657ddb Author: Andreas Gruenbacher <[email protected]> Date: Fri Jun 17 07:22:15 2016 -0500 gfs2: Initialize iopen glock holder for new inodes commit 36e4ad0316c017d5b271378ed9a1c9a4b77fab5f Author: Bob Peterson <[email protected]> Date: Thu Jun 9 14:24:07 2016 -0500 GFS2: don't set rgrp gl_object until it's inserted into rgrp tree commit e97321fa095f1ea7110d4d2ba446bd6141ed9a03 Author: Bob Peterson <[email protected]> Date: Tue Apr 12 16:14:26 2016 -0400 GFS2: Don't dereference inode in gfs2_inode_lookup until it's valid commit 3e11e530415027a57936545957126aff49267b76 Author: Benjamin Marzinski <[email protected]> Date: Wed Mar 23 14:29:59 2016 -0400 GFS2: ignore unlock failures after withdraw commit 2df6f47150b6afbb258ed1d5c9ed78c23df05053 Author: Bob Peterson <[email protected]> Date: Wed Jan 27 16:00:38 2016 -0500 GFS2: Fix direct IO write rounding error commit a93a99838248bdab49db2eaac00236847670bc7f Author: Junxiao Bi <[email protected]> Date: Tue Dec 22 08:06:08 2015 -0600 gfs2: fix flock panic issue commit 6cc4b6e801c725321e9f63ca7c2d00af8df24699 Author: Bob Peterson <[email protected]> Date: Fri Dec 4 13:04:34 2015 -0600 GFS2: Don't do glock put on when inode creation fails commit 5ea31bc0a6524b4fee8dc9ae8005d4a114a79812 Author: Bob Peterson <[email protected]> Date: Fri Dec 4 12:57:00 2015 -0600 GFS2: Always use iopen glock for gl_deletes commit 783013c0f5c7263a31703b15aeebbac279b4d4fe Author: Bob Peterson <[email protected]> Date: Fri Dec 4 10:19:14 2015 -0600 GFS2: Release iopen glock in gfs2_create_inode error cases commit 400ac52e805bb6852e743817bc05a136e85042a9 Author: Benjamin Marzinski <[email protected]> Date: Wed Dec 9 07:46:33 2015 -0600 gfs2: clear journal live bit in gfs2_log_flush ________________________________________ From: Bob Peterson <[email protected]> Sent: Monday, November 13, 2017 2:56 PM To: Edvin Torok Cc: Stefano Panella; Jonathan Davies; Mark Syms; cluster-devel Subject: Re: GFS2 backports ----- Original Message ----- | Hi, | | It was nice meeting you at the Cluster Summit, good to see such an | active community around corosync and GFS2. | | I have just seen your GFS2 pull request for 4.14, which contains fixes | for some data/memory corruption bugs. | It would appear that the corruption bugfixes are small enough to meet | the -stable criteria [1], do you intend to send them to | [email protected] for the benefit of LTS kernel users (e.g. 4.4.x)? | | [1] | https://www.kernel.org/doc/html/v4.14/process/stable-kernel-rules.html#stable-kernel-rules | | Thanks, | --Edwin | Hi Edwin, It was nice to meet you at Cluster Summit 2017 in Nuremberg and see the level of interest in GFS2 in the real world. The 4.14 fixes you wrote about were for the last merge window, back in September, and it was a large one with more GFS2 patches than usual. Some of them were actually put into stable branches, if I remember correctly. The current merge window for 4.15 is already open, and I've already posted our list of proposed patches we have for it, so this is a rapidly moving environment. :) I'm redirecting your email to the public cluster-devel mailing list so that other developers can see your email and make comments. If you're not already subscribed, perhaps you might want to subscribe to that mailing list for the latest GFS2 discussions and patches. You can subscribe here: https://www.redhat.com/mailman/listinfo/cluster-devel At this point, we'd have to go back and see which patches were in that merge window and which ones got rolled into which stable kernel. If you have specific patches you want to see ported to stable branches, feel free to ask. It shouldn't take much effort to get that to happen. Regards, Bob Peterson Red Hat File Systems
