Re: [PATCH 07/10] dax: export functions for use with btrfs
If you want to export these at all they have to be EXPORT_SYMBOL_GPL. But I'd really like to avoid seeing another duplicate DAX I/O path. Please try to adopt the existing iomap-based infrastructure for your needs first.
Re: [RFC PATCH] btrfs: drop file privileges in btrfs_clone_files
On Wed, Nov 28, 2018 at 09:44:59AM +0200, Nikolay Borisov wrote: > > > On 28.11.18 г. 5:07 ч., Lu Fengqi wrote: > > The generic/513 tell that cloning into a file did not strip security > > privileges (suid, capabilities) like a regular write would. > > > > Signed-off-by: Lu Fengqi > > --- > > The xfs and ocfs2 call generic_remap_file_range_prep to drop file > > privileges, I'm not sure whether btrfs should do the same thing. > > Why do you think btrfs shouldn't do the same thing. Looking at > remap_file_range_prep it seems that btrfs is missing a ton of checks > that are useful i.e immutable files/aligned offsets etc. Any chance we could move btrfs over to use remap_file_range_prep so that all file systems share the exact same checks?
Re: [PATCH V10 08/19] btrfs: move bio_pages_all() to btrfs
On Mon, Nov 19, 2018 at 04:19:24PM +0800, Ming Lei wrote: > On Fri, Nov 16, 2018 at 02:38:45PM +0100, Christoph Hellwig wrote: > > On Thu, Nov 15, 2018 at 04:52:55PM +0800, Ming Lei wrote: > > > BTRFS is the only user of this helper, so move this helper into > > > BTRFS, and implement it via bio_for_each_segment_all(), since > > > bio->bi_vcnt may not equal to number of pages after multipage bvec > > > is enabled. > > > > btrfs only uses the value to check if it is larger than 1. No amount > > of multipage bio merging should ever make bi_vcnt go from 0 to 1 or > > vice versa. > > Could you explain a bit why? > > Suppose 2 physically continuous pages are added to this bio, .bi_vcnt > can be 1 in case of multi-page bvec, but it is 2 in case of single-page > bvec. True, I did think of 0 vs 1. The magic here in btrfs still doesn't make much sense. The comments down in btrfs_check_repairable talk about sectors, so it might be a leftover from the blocksize == PAGE_SIZE days (assuming those patches actually got merged). I'd like to have the btrfs folks chime in, but in the end we should probably check if the bio was larger than a single sector here.
Re: [PATCH 04/25] vfs: strengthen checking of file range inputs to generic_remap_checks
On Wed, Oct 10, 2018 at 09:12:46PM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong > > File range remapping, if allowed to run past the destination file's EOF, > is an optimization on a regular file write. Regular file writes that > extend the file length are subject to various constraints which are not > checked by range cloning. > > This is a correctness problem because we're never allowed to touch > ranges that the page cache can't support (s_maxbytes); we're not > supposed to deal with large offsets (MAX_NON_LFS) if O_LARGEFILE isn't > set; and we must obey resource limits (RLIMIT_FSIZE). > > Therefore, add these checks to the new generic_remap_checks function so > that we curtail unexpected behavior. > > Signed-off-by: Darrick J. Wong > Reviewed-by: Amir Goldstein Looks good, Reviewed-by: Christoph Hellwig
Re: [PATCH 03/25] vfs: check file ranges before cloning files
> -EXPORT_SYMBOL(vfs_clone_file_prep_inodes); > +EXPORT_SYMBOL(vfs_clone_file_prep); Btw, why isn't this EXPORT_SYMBOL_GPL? It is rather Linux internal code, including some that I wrote which you lifted into the core in "vfs: refactor clone/dedupe_file_range common functions".
Re: [PATCH 02/25] vfs: vfs_clone_file_prep_inodes should return EINVAL for a clone from beyond EOF
On Wed, Oct 10, 2018 at 09:12:30PM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong > > vfs_clone_file_prep_inodes cannot return 0 if it is asked to remap from > a zero byte file because that's what btrfs does. > > Signed-off-by: Darrick J. Wong > --- Maybe it would be a good time to switch btrfs to use vfs_clone_file_prep_inodes so that we don't have any discrepancies? Otherwise looks good: Reviewed-by: Christoph Hellwig
Re: [PATCH 01/25] xfs: add a per-xfs trace_printk macro
On Wed, Oct 10, 2018 at 09:12:23PM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong > > Add a "xfs_tprintk" macro so that developers can use trace_printk to > print out arbitrary debugging information with the XFS device name > attached to the trace output. I can't say I'm a fan of this. trace_printk is a debugging aid, and opencoding the file system name really isn't much of a burden.
Re: exporting uuid_t from uapi/linux/uuid.h
On Tue, Aug 21, 2018 at 10:25:50PM +0300, Nikolay Borisov wrote: > Hello, > > I'm doing some refactoring on btrfs and would like to convert some of > the "naked" definition of UUID in on-disk object to uuid_t. Since those > structs are defined in include/uapi/linux/btrfs_tree.h I naturally would > need to expose the kernel uuid_t to the uapi header via > include/uapi/linux/uuid.h. The uuid was originally unexported in > 60927bc31436 ("uuid: remove uuid_be defintions from the uapi header") > but at that time it was called uuid_be. The thing is that btrfs-progs > for example uses both btrfs_tree.h uapi header as well as libuuid. THe > latter also defines uuid_t. Both the kernel one and libuuid are binary > compatible. > > So what's the correct way (if any) to expose a kernel type that might > have also been defined by a userspace library? Please just use an raw array of __u8 for your APIs.
Re: [PATCH 01/10] mm: pagecache add lock
On Sun, May 20, 2018 at 06:45:24PM -0400, Kent Overstreet wrote: > > > > Honestly I think this probably should be in the core. But IFF we move > > it to the core the existing users of per-fs locks need to be moved > > over first. E.g. XFS as the very first one, and at least ext4 and f2fs > > that copied the approach, and probably more if you audit deep enough. > > I'm not going to go and redo locking in XFS and ext4 as a prerequisite to > merging bcachefs. Sorry, but that's a bit crazy. It isn't crazy at all. In general we expect people to do their fair share of core work to get their pet feature in. How much is acceptable is a difficult question and not black and white. But if you want to grow a critical core structure you better take a stab at converting existing users. Without that the tradeoff of growing that core structure is simply not given. Or to put it in words for this exact feature: unless your new field is also used by mainstream file systems it is not going to be added. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: implement unlocked buffered write
On Tue, May 22, 2018 at 02:31:36PM -0400, Chris Mason wrote: > > And what protects two writes from interleaving their results now? > > page locks...ish, we at least won't have results interleaved in a single > page. For btrfs it'll actually be multiple pages since we try to do more > than one at a time. I think you are going to break just about every assumption people have that any single write is going to be atomic vs another write. E.g. this comes from the posix read definition reference by the write definition: "I/O is intended to be atomic to ordinary files and pipes and FIFOs. Atomic means that all the bytes from a single operation that started out together end up together, without interleaving from other I/O operations. It is a known attribute of terminals that this is not honored, and terminals are explicitly (and implicitly permanently) excepted, making the behavior unspecified. The behavior for other device types is also left unspecified, but the wording is intended to imply that future standards might choose to specify atomicity (or not). " Without taking the inode lock (or some sort of range lock) you can easily interleave data from two write requests. > But we're not avoiding the inode lock completely, we're just dropping it for > the expensive parts of writing to the file. A quick guess about what the > expensive parts are: The way I read the patch it basically 'avoids' the inode lock for almost the whole write call, just minus some setup. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: implement unlocked buffered write
On Wed, May 16, 2018 at 11:52:37AM +0800, robbieko wrote: > From: Robbie Ko> > This idea is from direct io. By this patch, we can make the buffered > write parallel, and improve the performance and latency. But because we > can not update isize without i_mutex, the unlocked buffered write just > can be done in front of the EOF. > > We needn't worry about the race between buffered write and truncate, > because the truncate need wait until all the buffered write end. > > And we also needn't worry about the race between dio write and punch hole, > because we have extent lock to protect our operation. > > I ran fio to test the performance of this feature. > > == Hardware == > CPU: Intel® Xeon® D-1531 > SSD: Intel S3710 200G > Volume : RAID 5 , SSD * 6 > > == config file == > [global] > group_reporting > time_based > thread=1 > norandommap > ioengine=libaio > bs=4k > iodepth=32 > size=16G > runtime=180 > numjobs=8 > rw=randwrite > > [file1] > filename=/mnt/btrfs/nocow/testfile > > == result (iops) == > lock = 68470 > unlocked = 94242 > > == result (clat) == > lock > lat (usec): min=184, max=1209.9K, avg=3738.35, stdev=20869.49 > clat percentiles (usec): > | 1.00th=[ 322], 5.00th=[ 330], 10.00th=[ 334], 20.00th=[ 346], > | 30.00th=[ 370], 40.00th=[ 386], 50.00th=[ 406], 60.00th=[ 446], > | 70.00th=[ 516], 80.00th=[ 612], 90.00th=[ 828], 95.00th=[10432], > | 99.00th=[84480], 99.50th=[117248], 99.90th=[226304], 99.95th=[333824], > | 99.99th=[692224] > > unlocked > lat (usec): min=10, max=218208, avg=2691.44, stdev=5380.82 > clat percentiles (usec): > | 1.00th=[ 302], 5.00th=[ 390], 10.00th=[ 442], 20.00th=[ 502], > | 30.00th=[ 548], 40.00th=[ 596], 50.00th=[ 652], 60.00th=[ 724], > | 70.00th=[ 916], 80.00th=[ 5024], 90.00th=[ 5664], 95.00th=[10048], > | 99.00th=[29568], 99.50th=[39168], 99.90th=[54016], 99.95th=[59648], > | 99.99th=[78336] > > Signed-off-by: Robbie Ko > --- > fs/btrfs/file.c | 17 + > 1 file changed, 17 insertions(+) > > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c > index 41ab907..8eac540 100644 > --- a/fs/btrfs/file.c > +++ b/fs/btrfs/file.c > @@ -1600,6 +1600,7 @@ static noinline ssize_t __btrfs_buffered_write(struct > file *file, > int ret = 0; > bool only_release_metadata = false; > bool force_page_uptodate = false; > + bool relock = false; > > nrptrs = min(DIV_ROUND_UP(iov_iter_count(i), PAGE_SIZE), > PAGE_SIZE / (sizeof(struct page *))); > @@ -1609,6 +1610,18 @@ static noinline ssize_t __btrfs_buffered_write(struct > file *file, > if (!pages) > return -ENOMEM; > > + inode_dio_begin(inode); > + > + /* > + * If the write is beyond the EOF, we need update > + * the isize, but it is protected by i_mutex. So we can > + * not unlock the i_mutex at this case. > + */ > + if (pos + iov_iter_count(i) <= i_size_read(inode)) { > + inode_unlock(inode); And what protects two writes from interleaving their results now? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/12] block: convert bounce, q->bio_split to bioset_init()/mempool_init()
Looks good, Reviewed-by: Christoph Hellwig <h...@lst.de> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/12] target: convert to bioset_init()/mempool_init()
Looks good, Reviewed-by: Christoph Hellwig <h...@lst.de> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/12] fs: convert block_dev.c to bioset_init()
Looks good, Reviewed-by: Christoph Hellwig <h...@lst.de> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 11/12] xfs: convert to bioset_init()/mempool_init()
This creates a minor conflict with my iomap series, but that should be easy to merge if needed. Otherwise looks good: Reviewed-by: Christoph Hellwig <h...@lst.de> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 12/12] block: Drop bioset_create()
Looks good, Reviewed-by: Christoph Hellwig <h...@lst.de> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 00/12] get rid of GFP_ZONE_TABLE/BAD
This seems to be missing patch 1 and generally be in somewhat odd format. Can you try to resend it with git-send-email and against current Linus' tree? Also I'd suggest you do cleanups like adding and using __GFP_ZONE_MASK at the beginning of the series before doing any real changes. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 02/12] arch/x86/kernel/amd_gart_64: update usage of address zone modifiers
This code doesn't exist in current mainline. What kernel version is your patch against? On Mon, May 21, 2018 at 11:20:23PM +0800, Huaisheng Ye wrote: > From: Huaisheng Ye> > Use __GFP_ZONE_MASK to replace (__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32). > > ___GFP_DMA, ___GFP_HIGHMEM and ___GFP_DMA32 have been deleted from GFP > bitmasks, the bottom three bits of GFP mask is reserved for storing > encoded zone number. > __GFP_DMA, __GFP_HIGHMEM and __GFP_DMA32 should not be operated by OR. If they have already been deleted the identifier should not exist anymore, so either your patch has issues, or at least the description. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/13] convert block layer to bioset_init()/mempool_init()
On Mon, May 21, 2018 at 07:38:55PM -0400, Kent Overstreet wrote: > On Mon, May 21, 2018 at 02:24:32PM -0400, Mike Snitzer wrote: > > Every single data structure change in this series should be reviewed for > > unforeseen alignment consequences. Jens seemed to say that is > > worthwhile. Not sure if he'll do it or we divide it up. If we divide > > it up a temp topic branch should be published for others to inspect. > > > > Could be alignment hasn't been a historic concern for a bunch of the > > data structures changed in this series.. if so then all we can do is fix > > up any obvious potential for false sharing. > > Honestly, I almost never worry about alignment... the very few times I do > care, > I use __cacheline_aligned_in_smp. And that generally is the right stratgey. If Mike really doesn't want a change we can just open code the kmalloc for the bio set there, the important point is that we should not keep the old API around for no good reason. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 05/12] include/linux/dma-mapping: update usage of address zone modifiers
On Mon, May 21, 2018 at 11:20:26PM +0800, Huaisheng Ye wrote: > From: Huaisheng Ye> > Use __GFP_ZONE_MASK to replace (__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32). > > ___GFP_DMA, ___GFP_HIGHMEM and ___GFP_DMA32 have been deleted from GFP > bitmasks, the bottom three bits of GFP mask is reserved for storing > encoded zone number. > __GFP_DMA, __GFP_HIGHMEM and __GFP_DMA32 should not be operated with >each others by OR. You have to include me for the whole series, otherwise I have absolutely no way to properly review your patch. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/10] bcache: move closures to lib/
On Fri, May 18, 2018 at 03:49:13AM -0400, Kent Overstreet wrote: > Prep work for bcachefs - being a fork of bcache it also uses closures Hell no. This code needs to go away and not actually be promoted to lib/. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 10/10] Dynamic fault injection
On Fri, May 18, 2018 at 03:49:18AM -0400, Kent Overstreet wrote: > Signed-off-by: Kent OverstreetCompletely lacks any explanation or argument why it would be useful. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 05/10] don't use spin_lock_irqsave() unnecessarily
On Fri, May 18, 2018 at 03:49:08AM -0400, Kent Overstreet wrote: > Signed-off-by: Kent OverstreetLooks generally fine. A little changelog with an explanation of how we obviously never could get here with irqs disabled would be nice, though. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/10] Generic radix trees
Completely lacks any explanation, including why this should be in lib/. Also should come in the same series with actual users of the infrastructure. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/10] mm: export find_get_pages()
On Fri, May 18, 2018 at 03:49:02AM -0400, Kent Overstreet wrote: > Signed-off-by: Kent Overstreet> --- > mm/filemap.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/mm/filemap.c b/mm/filemap.c > index 31dd888785..78b99019bf 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -1845,6 +1845,7 @@ unsigned find_get_pages_range(struct address_space > *mapping, pgoff_t *start, > > return ret; > } > +EXPORT_SYMBOL(find_get_pages_range); EXPORT_SYMBOL_GPL and only together with the actual submission of a user of the interface. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/10] mm: pagecache add lock
On Fri, May 18, 2018 at 06:13:06AM -0700, Matthew Wilcox wrote: > > Historically, the only problematic case has been direct IO, and people > > have been willing to say "well, if you mix buffered and direct IO you > > get what you deserve", and that's probably not unreasonable. But now we > > have fallocate insert range and collapse range, and those are broken in > > ways I frankly don't want to think about if they can't ensure consistency > > with the page cache. > > ext4 manages collapse-vs-pagefault with the ext4-specific i_mmap_sem. > You may get pushback on the grounds that this ought to be a > filesystem-specific lock rather than one embedded in the generic inode. Honestly I think this probably should be in the core. But IFF we move it to the core the existing users of per-fs locks need to be moved over first. E.g. XFS as the very first one, and at least ext4 and f2fs that copied the approach, and probably more if you audit deep enough. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 14/14] mm: turn on vm_fault_t type checking
On Wed, May 16, 2018 at 08:08:29AM -0700, Darrick J. Wong wrote: > Uh, we're changing function signatures /and/ redefinining vm_fault_t? > All in the same 90K patch? > > I /was/ expecting a series of "convert X and all callers/users" > patches followed by a trivial one to switch the definition, not a giant > pile of change. FWIW I don't mind so much if you make a patch > containing a change for some super-common primitive and a hojillion > little diff hunks tree-wide, but only one logical change at a time for a > big patch, please... > > I quite prefer seeing the whole series from start to finish all packaged > up in one series, but wow this was overwhelming. :/ Another vote to split the change of the typedef, ok I get the message.. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: vm_fault_t conversion, for real
On Wed, May 16, 2018 at 06:22:56AM -0700, Matthew Wilcox wrote: > Perhaps you should try being less of an arsehole if you don't want to > get yelled at? I don't mind when you're an arsehole towards me, but I > do mind when you're an arsehole towards newcomers. How are we supposed > to attract and retain new maintainers when you're so rude? *plonk* The only one I'm seeing being extremely rude here is you. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 14/14] mm: turn on vm_fault_t type checking
On Wed, May 16, 2018 at 04:28:13AM -0700, Matthew Wilcox wrote: > On Wed, May 16, 2018 at 07:43:48AM +0200, Christoph Hellwig wrote: > > Switch vm_fault_t to point to an unsigned int with __bіtwise annotations. > > This both catches any old ->fault or ->page_mkwrite instance with plain > > compiler type checking, as well as finding more intricate problems with > > sparse. > > Come on, Christoph; you know better than this. This patch is completely > unreviewable. Split it into one patch per maintainer tree, and in any > event, the patch to convert vm_fault_t to an unsigned int should be > separated from all the trivial conversions. The whole point is that tiny split patches for mechnical translations are totally pointless. Switching the typedef might be worth splitting if people really insist. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: vm_fault_t conversion, for real
On Wed, May 16, 2018 at 04:23:47AM -0700, Matthew Wilcox wrote: > On Wed, May 16, 2018 at 07:43:34AM +0200, Christoph Hellwig wrote: > > this series tries to actually turn vm_fault_t into a type that can be > > typechecked and checks the fallout instead of sprinkling random > > annotations without context. > > Yes, why should we have small tasks that newcomers can do when the mighty > Christoph Hellwig can swoop in and take over from them? Seriously, > can't your talents find a better use than this? I've spent less time on this than trying to argue to you and Souptick that these changes are only to get ignored and yelled at as an "asshole maintainer". So yes, I could have done more productive things if you hadn't forced this escalation. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 10/14] vgem: separate errno from VM_FAULT_* values
On Wed, May 16, 2018 at 11:53:03AM +0200, Daniel Vetter wrote: > Reviewed-by: Daniel Vetter> > Want me to merge this through drm-misc or plan to pick it up yourself? For now I just want a honest discussion if people really actually want the vm_fault_t change with the whole picture in place. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
vm_fault_t conversion, for real
Hi all, this series tries to actually turn vm_fault_t into a type that can be typechecked and checks the fallout instead of sprinkling random annotations without context. The first one fixes a real bug in orangefs, the second and third fix mismatched existing vm_fault_t annotations on the same function, the fourth removes an unused export that was in the chain. The remainder until the last one do some not quite trivial conversions, and the last one does the trivial mass annotation and flips vm_fault_t to a __bitwise unsigned int - the unsigned means we also get plain compiler type checking for the new ->fault signature even without sparse. This has survived an x86 allyesconfig build, and got a SUCCESS from the buildbot that I don't really trust - I'm pretty sure there are bits and pieces hiding in other architectures that it hasn't caught up to. The sparse annotations are manuall verified for the core MM code and a few other interesting bits (e.g. DAX and the x86 fault code) The series is against linux-next as of 2018/05/15 to make sure any annotations in subsystem trees are picked up. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 02/14] fs: make the filemap_page_mkwrite prototype consistent
!CONFIG_MMU version didn't agree with the rest of the kernel.. Signed-off-by: Christoph Hellwig <h...@lst.de> --- mm/filemap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/filemap.c b/mm/filemap.c index 52517f28e6f4..cf21ced98eff 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2748,7 +2748,7 @@ int generic_file_readonly_mmap(struct file *file, struct vm_area_struct *vma) return generic_file_mmap(file, vma); } #else -int filemap_page_mkwrite(struct vm_fault *vmf) +vm_fault_t filemap_page_mkwrite(struct vm_fault *vmf) { return -ENOSYS; } -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/14] orangefs: don't return errno values from ->fault
Signed-off-by: Christoph Hellwig <h...@lst.de> --- fs/orangefs/file.c | 10 -- 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c index 26358efbf794..b4a25cd4f3fa 100644 --- a/fs/orangefs/file.c +++ b/fs/orangefs/file.c @@ -528,18 +528,16 @@ static long orangefs_ioctl(struct file *file, unsigned int cmd, unsigned long ar return ret; } -static int orangefs_fault(struct vm_fault *vmf) +static vm_fault_t orangefs_fault(struct vm_fault *vmf) { struct file *file = vmf->vma->vm_file; int rc; - rc = orangefs_inode_getattr(file->f_mapping->host, 0, 1, - STATX_SIZE); - if (rc == -ESTALE) - rc = -EIO; + + rc = orangefs_inode_getattr(file->f_mapping->host, 0, 1, STATX_SIZE); if (rc) { gossip_err("%s: orangefs_inode_getattr failed, " "rc:%d:.\n", __func__, rc); - return rc; + return VM_FAULT_SIGBUS; } return filemap_fault(vmf); } -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/14] dax: make the dax_iomap_fault prototype consistent
Signed-off-by: Christoph Hellwig <h...@lst.de> --- include/linux/dax.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/dax.h b/include/linux/dax.h index dc65ece825ee..a292bccdc274 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -183,7 +183,7 @@ void dax_flush(struct dax_device *dax_dev, void *addr, size_t size); ssize_t dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter, const struct iomap_ops *ops); -int dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size, +vm_fault_t dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size, pfn_t *pfnp, int *errp, const struct iomap_ops *ops); vm_fault_t dax_finish_sync_fault(struct vm_fault *vmf, enum page_entry_size pe_size, pfn_t pfn); -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 04/14] mm: remove the unused device_private_entry_fault export
Signed-off-by: Christoph Hellwig <h...@lst.de> --- kernel/memremap.c | 1 - 1 file changed, 1 deletion(-) diff --git a/kernel/memremap.c b/kernel/memremap.c index db4e1a373e5f..59ee3b604b39 100644 --- a/kernel/memremap.c +++ b/kernel/memremap.c @@ -65,7 +65,6 @@ int device_private_entry_fault(struct vm_area_struct *vma, */ return page->pgmap->page_fault(vma, addr, page, flags, pmdp); } -EXPORT_SYMBOL(device_private_entry_fault); #endif /* CONFIG_DEVICE_PRIVATE */ static void pgmap_radix_release(struct resource *res, unsigned long end_pgoff) -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/14] ceph: untangle ceph_filemap_fault
Streamline the code to have a somewhat natural flow, and separate the errno values from the VM_FAULT_* values. Signed-off-by: Christoph Hellwig <h...@lst.de> --- fs/ceph/addr.c | 100 + 1 file changed, 51 insertions(+), 49 deletions(-) diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index 5f7ad3d0df2e..6e80894ca073 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -1428,15 +1428,18 @@ static void ceph_restore_sigs(sigset_t *oldset) /* * vm ops */ -static int ceph_filemap_fault(struct vm_fault *vmf) +static vm_fault_t ceph_filemap_fault(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; struct inode *inode = file_inode(vma->vm_file); + struct address_space *mapping = inode->i_mapping; struct ceph_inode_info *ci = ceph_inode(inode); struct ceph_file_info *fi = vma->vm_file->private_data; - struct page *pinned_page = NULL; + struct page *pinned_page = NULL, *page; loff_t off = vmf->pgoff << PAGE_SHIFT; - int want, got, ret; + int want, got, err = 0; + vm_fault_t ret = 0; + bool did_fault = false; sigset_t oldset; ceph_block_sigs(); @@ -1449,9 +1452,9 @@ static int ceph_filemap_fault(struct vm_fault *vmf) want = CEPH_CAP_FILE_CACHE; got = 0; - ret = ceph_get_caps(ci, CEPH_CAP_FILE_RD, want, -1, , _page); - if (ret < 0) - goto out_restore; + err = ceph_get_caps(ci, CEPH_CAP_FILE_RD, want, -1, , _page); + if (err < 0) + goto out_errno; dout("filemap_fault %p %llu~%zd got cap refs on %s\n", inode, off, (size_t)PAGE_SIZE, ceph_cap_string(got)); @@ -1462,8 +1465,8 @@ static int ceph_filemap_fault(struct vm_fault *vmf) ceph_add_rw_context(fi, _ctx); ret = filemap_fault(vmf); ceph_del_rw_context(fi, _ctx); - } else - ret = -EAGAIN; + did_fault = true; + } dout("filemap_fault %p %llu~%zd dropping cap refs on %s ret %d\n", inode, off, (size_t)PAGE_SIZE, ceph_cap_string(got), ret); @@ -1471,57 +1474,55 @@ static int ceph_filemap_fault(struct vm_fault *vmf) put_page(pinned_page); ceph_put_cap_refs(ci, got); - if (ret != -EAGAIN) + if (did_fault) goto out_restore; /* read inline data */ if (off >= PAGE_SIZE) { /* does not support inline data > PAGE_SIZE */ ret = VM_FAULT_SIGBUS; + goto out_restore; + } + + page = find_or_create_page(mapping, 0, + mapping_gfp_constraint(mapping, ~__GFP_FS)); + if (!page) { + ret = VM_FAULT_OOM; + goto out_inline; + } + + err = __ceph_do_getattr(inode, page, CEPH_STAT_CAP_INLINE_DATA, true); + if (err < 0 || off >= i_size_read(inode)) { + unlock_page(page); + put_page(page); + if (err < 0) + goto out_errno; + ret = VM_FAULT_SIGBUS; } else { - int ret1; - struct address_space *mapping = inode->i_mapping; - struct page *page = find_or_create_page(mapping, 0, - mapping_gfp_constraint(mapping, - ~__GFP_FS)); - if (!page) { - ret = VM_FAULT_OOM; - goto out_inline; - } - ret1 = __ceph_do_getattr(inode, page, -CEPH_STAT_CAP_INLINE_DATA, true); - if (ret1 < 0 || off >= i_size_read(inode)) { - unlock_page(page); - put_page(page); - if (ret1 < 0) - ret = ret1; - else - ret = VM_FAULT_SIGBUS; - goto out_inline; - } - if (ret1 < PAGE_SIZE) - zero_user_segment(page, ret1, PAGE_SIZE); + if (err < PAGE_SIZE) + zero_user_segment(page, err, PAGE_SIZE); else flush_dcache_page(page); SetPageUptodate(page); vmf->page = page; ret = VM_FAULT_MAJOR | VM_FAULT_LOCKED; -out_inline: - dout("filemap_fault %p %llu~%zd read inline data ret %d\n", -inode, off, (size_t)PAGE_SIZE, ret); } + +out_inline: + dout("filemap_fault %p %llu~%zd read inline data ret %d\n", +inode, off, (size_t)PAGE_SIZE, ret); out_restore: ceph_restore_sigs(); - if (ret < 0) - ret =
[PATCH 06/14] btrfs: separate errno from VM_FAULT_* values
Signed-off-by: Christoph Hellwig <h...@lst.de> --- fs/btrfs/ctree.h | 2 +- fs/btrfs/inode.c | 19 ++- 2 files changed, 11 insertions(+), 10 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 1485cd130e2b..02a0de73c1d1 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3203,7 +3203,7 @@ int btrfs_merge_bio_hook(struct page *page, unsigned long offset, size_t size, struct bio *bio, unsigned long bio_flags); void btrfs_set_range_writeback(void *private_data, u64 start, u64 end); -int btrfs_page_mkwrite(struct vm_fault *vmf); +vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf); int btrfs_readpage(struct file *file, struct page *page); void btrfs_evict_inode(struct inode *inode); int btrfs_write_inode(struct inode *inode, struct writeback_control *wbc); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index ec9db248c499..f4f03f0f4556 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8824,7 +8824,7 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset, * beyond EOF, then the page is guaranteed safe against truncation until we * unlock the page. */ -int btrfs_page_mkwrite(struct vm_fault *vmf) +vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf) { struct page *page = vmf->page; struct inode *inode = file_inode(vmf->vma->vm_file); @@ -8836,7 +8836,8 @@ int btrfs_page_mkwrite(struct vm_fault *vmf) char *kaddr; unsigned long zero_start; loff_t size; - int ret; + vm_fault_t ret; + int err; int reserved = 0; u64 reserved_space; u64 page_start; @@ -8858,14 +8859,14 @@ int btrfs_page_mkwrite(struct vm_fault *vmf) * end up waiting indefinitely to get a lock on the page currently * being processed by btrfs_page_mkwrite() function. */ - ret = btrfs_delalloc_reserve_space(inode, _reserved, page_start, + err = btrfs_delalloc_reserve_space(inode, _reserved, page_start, reserved_space); - if (!ret) { - ret = file_update_time(vmf->vma->vm_file); + if (!err) { + err = file_update_time(vmf->vma->vm_file); reserved = 1; } - if (ret) { - if (ret == -ENOMEM) + if (err) { + if (err == -ENOMEM) ret = VM_FAULT_OOM; else /* -ENOSPC, -EIO, etc */ ret = VM_FAULT_SIGBUS; @@ -8927,9 +8928,9 @@ int btrfs_page_mkwrite(struct vm_fault *vmf) EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG, 0, 0, _state); - ret = btrfs_set_extent_delalloc(inode, page_start, end, 0, + err = btrfs_set_extent_delalloc(inode, page_start, end, 0, _state, 0); - if (ret) { + if (err) { unlock_extent_cached(io_tree, page_start, page_end, _state); ret = VM_FAULT_SIGBUS; -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/14] ext4: separate errno from VM_FAULT_* values
Signed-off-by: Christoph Hellwig <h...@lst.de> --- fs/ext4/ext4.h | 4 ++-- fs/ext4/inode.c | 30 +++--- 2 files changed, 17 insertions(+), 17 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index fa52b7dd4542..48592d0edf3e 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -2463,8 +2463,8 @@ extern int ext4_writepage_trans_blocks(struct inode *); extern int ext4_chunk_trans_blocks(struct inode *, int nrblocks); extern int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode, loff_t lstart, loff_t lend); -extern int ext4_page_mkwrite(struct vm_fault *vmf); -extern int ext4_filemap_fault(struct vm_fault *vmf); +extern vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf); +extern vm_fault_t ext4_filemap_fault(struct vm_fault *vmf); extern qsize_t *ext4_get_reserved_space(struct inode *inode); extern int ext4_get_projid(struct inode *inode, kprojid_t *projid); extern void ext4_da_update_reserve_space(struct inode *inode, diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 95bc48f5c88b..fe49045a2832 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -6106,27 +6106,27 @@ static int ext4_bh_unmapped(handle_t *handle, struct buffer_head *bh) return !buffer_mapped(bh); } -int ext4_page_mkwrite(struct vm_fault *vmf) +vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; struct page *page = vmf->page; loff_t size; unsigned long len; - int ret; + vm_fault_t ret; struct file *file = vma->vm_file; struct inode *inode = file_inode(file); struct address_space *mapping = inode->i_mapping; handle_t *handle; get_block_t *get_block; - int retries = 0; + int retries = 0, err; sb_start_pagefault(inode->i_sb); file_update_time(vma->vm_file); down_read(_I(inode)->i_mmap_sem); - ret = ext4_convert_inline_data(inode); - if (ret) + err = ext4_convert_inline_data(inode); + if (err) goto out_ret; /* Delalloc case is easy... */ @@ -6134,9 +6134,9 @@ int ext4_page_mkwrite(struct vm_fault *vmf) !ext4_should_journal_data(inode) && !ext4_nonda_switch(inode->i_sb)) { do { - ret = block_page_mkwrite(vma, vmf, + err = block_page_mkwrite(vma, vmf, ext4_da_get_block_prep); - } while (ret == -ENOSPC && + } while (err == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, )); goto out_ret; } @@ -6181,8 +6181,8 @@ int ext4_page_mkwrite(struct vm_fault *vmf) ret = VM_FAULT_SIGBUS; goto out; } - ret = block_page_mkwrite(vma, vmf, get_block); - if (!ret && ext4_should_journal_data(inode)) { + err = block_page_mkwrite(vma, vmf, get_block); + if (!err && ext4_should_journal_data(inode)) { if (ext4_walk_page_buffers(handle, page_buffers(page), 0, PAGE_SIZE, NULL, do_journal_get_write_access)) { unlock_page(page); @@ -6193,24 +6193,24 @@ int ext4_page_mkwrite(struct vm_fault *vmf) ext4_set_inode_state(inode, EXT4_STATE_JDATA); } ext4_journal_stop(handle); - if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, )) + if (err == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, )) goto retry_alloc; out_ret: - ret = block_page_mkwrite_return(ret); + ret = block_page_mkwrite_return(err); out: up_read(_I(inode)->i_mmap_sem); sb_end_pagefault(inode->i_sb); return ret; } -int ext4_filemap_fault(struct vm_fault *vmf) +vm_fault_t ext4_filemap_fault(struct vm_fault *vmf) { struct inode *inode = file_inode(vmf->vma->vm_file); - int err; + vm_fault_t ret; down_read(_I(inode)->i_mmap_sem); - err = filemap_fault(vmf); + ret = filemap_fault(vmf); up_read(_I(inode)->i_mmap_sem); - return err; + return ret; } -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/14] vgem: separate errno from VM_FAULT_* values
And streamline the code in vgem_fault with early returns so that it is a little bit more readable. Signed-off-by: Christoph Hellwig <h...@lst.de> --- drivers/gpu/drm/vgem/vgem_drv.c | 51 +++-- 1 file changed, 23 insertions(+), 28 deletions(-) diff --git a/drivers/gpu/drm/vgem/vgem_drv.c b/drivers/gpu/drm/vgem/vgem_drv.c index 2524ff116f00..a261e0aab83a 100644 --- a/drivers/gpu/drm/vgem/vgem_drv.c +++ b/drivers/gpu/drm/vgem/vgem_drv.c @@ -61,12 +61,13 @@ static void vgem_gem_free_object(struct drm_gem_object *obj) kfree(vgem_obj); } -static int vgem_gem_fault(struct vm_fault *vmf) +static vm_fault_t vgem_gem_fault(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; struct drm_vgem_gem_object *obj = vma->vm_private_data; /* We don't use vmf->pgoff since that has the fake offset */ unsigned long vaddr = vmf->address; + struct page *page; int ret; loff_t num_pages; pgoff_t page_offset; @@ -85,35 +86,29 @@ static int vgem_gem_fault(struct vm_fault *vmf) ret = 0; } mutex_unlock(>pages_lock); - if (ret) { - struct page *page; - - page = shmem_read_mapping_page( - file_inode(obj->base.filp)->i_mapping, - page_offset); - if (!IS_ERR(page)) { - vmf->page = page; - ret = 0; - } else switch (PTR_ERR(page)) { - case -ENOSPC: - case -ENOMEM: - ret = VM_FAULT_OOM; - break; - case -EBUSY: - ret = VM_FAULT_RETRY; - break; - case -EFAULT: - case -EINVAL: - ret = VM_FAULT_SIGBUS; - break; - default: - WARN_ON(PTR_ERR(page)); - ret = VM_FAULT_SIGBUS; - break; - } + if (!ret) + return 0; + + page = shmem_read_mapping_page(file_inode(obj->base.filp)->i_mapping, + page_offset); + if (!IS_ERR(page)) { + vmf->page = page; + return 0; + } + switch (PTR_ERR(page)) { + case -ENOSPC: + case -ENOMEM: + return VM_FAULT_OOM; + case -EBUSY: + return VM_FAULT_RETRY; + case -EFAULT: + case -EINVAL: + return VM_FAULT_SIGBUS; + default: + WARN_ON(PTR_ERR(page)); + return VM_FAULT_SIGBUS; } - return ret; } static const struct vm_operations_struct vgem_gem_vm_ops = { -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 09/14] ubifs: separate errno from VM_FAULT_* values
Signed-off-by: Christoph Hellwig <h...@lst.de> --- fs/ubifs/file.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c index 1acb2ff505e6..7c1a2e1c3de5 100644 --- a/fs/ubifs/file.c +++ b/fs/ubifs/file.c @@ -1513,7 +1513,7 @@ static int ubifs_releasepage(struct page *page, gfp_t unused_gfp_flags) * mmap()d file has taken write protection fault and is being made writable. * UBIFS must ensure page is budgeted for. */ -static int ubifs_vm_page_mkwrite(struct vm_fault *vmf) +static vm_fault_t ubifs_vm_page_mkwrite(struct vm_fault *vmf) { struct page *page = vmf->page; struct inode *inode = file_inode(vmf->vma->vm_file); @@ -1521,6 +1521,7 @@ static int ubifs_vm_page_mkwrite(struct vm_fault *vmf) struct timespec now = current_time(inode); struct ubifs_budget_req req = { .new_page = 1 }; int err, update_time; + vm_fault_t ret = 0; dbg_gen("ino %lu, pg %lu, i_size %lld", inode->i_ino, page->index, i_size_read(inode)); @@ -1601,8 +1602,8 @@ static int ubifs_vm_page_mkwrite(struct vm_fault *vmf) unlock_page(page); ubifs_release_budget(c, ); if (err) - err = VM_FAULT_SIGBUS; - return err; + ret = VM_FAULT_SIGBUS; + return ret; } static const struct vm_operations_struct ubifs_file_vm_ops = { -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 11/14] ttm: separate errno from VM_FAULT_* values
Signed-off-by: Christoph Hellwig <h...@lst.de> --- drivers/gpu/drm/ttm/ttm_bo_vm.c | 42 + 1 file changed, 22 insertions(+), 20 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 8eba95b3c737..255e7801f62c 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -43,10 +43,11 @@ #define TTM_BO_VM_NUM_PREFAULT 16 -static int ttm_bo_vm_fault_idle(struct ttm_buffer_object *bo, +static vm_fault_t ttm_bo_vm_fault_idle(struct ttm_buffer_object *bo, struct vm_fault *vmf) { - int ret = 0; + vm_fault_t ret = 0; + int err = 0; if (likely(!bo->moving)) goto out_unlock; @@ -77,8 +78,8 @@ static int ttm_bo_vm_fault_idle(struct ttm_buffer_object *bo, /* * Ordinary wait. */ - ret = dma_fence_wait(bo->moving, true); - if (unlikely(ret != 0)) { + err = dma_fence_wait(bo->moving, true); + if (unlikely(err != 0)) { ret = (ret != -ERESTARTSYS) ? VM_FAULT_SIGBUS : VM_FAULT_NOPAGE; goto out_unlock; @@ -104,7 +105,7 @@ static unsigned long ttm_bo_io_mem_pfn(struct ttm_buffer_object *bo, + page_offset; } -static int ttm_bo_vm_fault(struct vm_fault *vmf) +static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; struct ttm_buffer_object *bo = (struct ttm_buffer_object *) @@ -115,7 +116,8 @@ static int ttm_bo_vm_fault(struct vm_fault *vmf) unsigned long pfn; struct ttm_tt *ttm = NULL; struct page *page; - int ret; + vm_fault_t ret; + int err; int i; unsigned long address = vmf->address; struct ttm_mem_type_manager *man = @@ -128,9 +130,9 @@ static int ttm_bo_vm_fault(struct vm_fault *vmf) * for reserve, and if it fails, retry the fault after waiting * for the buffer to become unreserved. */ - ret = ttm_bo_reserve(bo, true, true, NULL); - if (unlikely(ret != 0)) { - if (ret != -EBUSY) + err = ttm_bo_reserve(bo, true, true, NULL); + if (unlikely(err != 0)) { + if (err != -EBUSY) return VM_FAULT_NOPAGE; if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) { @@ -162,8 +164,8 @@ static int ttm_bo_vm_fault(struct vm_fault *vmf) } if (bdev->driver->fault_reserve_notify) { - ret = bdev->driver->fault_reserve_notify(bo); - switch (ret) { + err = bdev->driver->fault_reserve_notify(bo); + switch (err) { case 0: break; case -EBUSY: @@ -191,13 +193,13 @@ static int ttm_bo_vm_fault(struct vm_fault *vmf) goto out_unlock; } - ret = ttm_mem_io_lock(man, true); - if (unlikely(ret != 0)) { + err = ttm_mem_io_lock(man, true); + if (unlikely(err != 0)) { ret = VM_FAULT_NOPAGE; goto out_unlock; } - ret = ttm_mem_io_reserve_vm(bo); - if (unlikely(ret != 0)) { + err = ttm_mem_io_reserve_vm(bo); + if (unlikely(err != 0)) { ret = VM_FAULT_SIGBUS; goto out_io_unlock; } @@ -265,21 +267,21 @@ static int ttm_bo_vm_fault(struct vm_fault *vmf) } if (vma->vm_flags & VM_MIXEDMAP) - ret = vm_insert_mixed(, address, + err = vm_insert_mixed(, address, __pfn_to_pfn_t(pfn, PFN_DEV)); else - ret = vm_insert_pfn(, address, pfn); + err = vm_insert_pfn(, address, pfn); /* * Somebody beat us to this PTE or prefaulting to * an already populated PTE, or prefaulting error. */ - if (unlikely((ret == -EBUSY) || (ret != 0 && i > 0))) + if (unlikely((err == -EBUSY) || (err != 0 && i > 0))) break; - else if (unlikely(ret != 0)) { + else if (unlikely(err != 0)) { ret = - (ret == -ENOMEM) ? VM_FAULT_OOM : VM_FAULT_SIGBUS; + (err == -ENOMEM) ? VM_FAULT_OOM : VM_FAULT_SIGBUS; goto out_io_unlock; } -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 12/14] lustre: separate errno from VM_FAULT_* values
Signed-off-by: Christoph Hellwig <h...@lst.de> --- .../staging/lustre/lustre/llite/llite_mmap.c | 37 +++ .../lustre/lustre/llite/vvp_internal.h| 2 +- 2 files changed, 14 insertions(+), 25 deletions(-) diff --git a/drivers/staging/lustre/lustre/llite/llite_mmap.c b/drivers/staging/lustre/lustre/llite/llite_mmap.c index 214b07554e62..061d98871959 100644 --- a/drivers/staging/lustre/lustre/llite/llite_mmap.c +++ b/drivers/staging/lustre/lustre/llite/llite_mmap.c @@ -231,23 +231,18 @@ static int ll_page_mkwrite0(struct vm_area_struct *vma, struct page *vmpage, return result; } -static inline int to_fault_error(int result) +static inline vm_fault_t to_fault_error(int result) { switch (result) { case 0: - result = VM_FAULT_LOCKED; - break; + return VM_FAULT_LOCKED; case -EFAULT: - result = VM_FAULT_NOPAGE; - break; + return VM_FAULT_NOPAGE; case -ENOMEM: - result = VM_FAULT_OOM; - break; + return VM_FAULT_OOM; default: - result = VM_FAULT_SIGBUS; - break; + return VM_FAULT_SIGBUS; } - return result; } /** @@ -261,7 +256,7 @@ static inline int to_fault_error(int result) * \retval VM_FAULT_ERROR on general error * \retval NOPAGE_OOM not have memory for allocate new page */ -static int ll_fault0(struct vm_area_struct *vma, struct vm_fault *vmf) +static vm_fault_t ll_fault0(struct vm_area_struct *vma, struct vm_fault *vmf) { struct lu_env *env; struct cl_io*io; @@ -269,7 +264,7 @@ static int ll_fault0(struct vm_area_struct *vma, struct vm_fault *vmf) struct page *vmpage; unsigned long ra_flags; int result = 0; - int fault_ret = 0; + vm_fault_tfault_ret = 0; u16 refcheck; env = cl_env_get(); @@ -323,7 +318,7 @@ static int ll_fault0(struct vm_area_struct *vma, struct vm_fault *vmf) return fault_ret; } -static int ll_fault(struct vm_fault *vmf) +static vm_fault_t ll_fault(struct vm_fault *vmf) { int count = 0; bool printed = false; @@ -364,7 +359,7 @@ static int ll_fault(struct vm_fault *vmf) return result; } -static int ll_page_mkwrite(struct vm_fault *vmf) +static vm_fault_t ll_page_mkwrite(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; int count = 0; @@ -390,22 +385,16 @@ static int ll_page_mkwrite(struct vm_fault *vmf) switch (result) { case 0: LASSERT(PageLocked(vmf->page)); - result = VM_FAULT_LOCKED; - break; + return VM_FAULT_LOCKED; case -ENODATA: case -EAGAIN: case -EFAULT: - result = VM_FAULT_NOPAGE; - break; + return VM_FAULT_NOPAGE; case -ENOMEM: - result = VM_FAULT_OOM; - break; + return VM_FAULT_OOM; default: - result = VM_FAULT_SIGBUS; - break; + return VM_FAULT_SIGBUS; } - - return result; } /** diff --git a/drivers/staging/lustre/lustre/llite/vvp_internal.h b/drivers/staging/lustre/lustre/llite/vvp_internal.h index 7d3abb43584a..c194966a3d82 100644 --- a/drivers/staging/lustre/lustre/llite/vvp_internal.h +++ b/drivers/staging/lustre/lustre/llite/vvp_internal.h @@ -83,7 +83,7 @@ struct vvp_io { /** * fault API used bitflags for return code. */ - unsigned intft_flags; + vm_fault_tft_flags; /** * check that flags are from filemap_fault */ -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 13/14] mm: move arch specific VM_FAULT_* flags to mm.h
Various architectures define their own internal flags. Not sure a public header like mm.h is a good place, but keeping them inside the arch code with possible conflicts also seems like a bad idea. Maybe we just need to stop overloading the value instead. Signed-off-by: Christoph Hellwig <h...@lst.de> --- arch/arm/mm/fault.c | 3 --- arch/arm64/mm/fault.c | 3 --- arch/s390/mm/fault.c | 6 -- arch/unicore32/mm/fault.c | 3 --- include/linux/mm.h| 7 +++ 5 files changed, 7 insertions(+), 15 deletions(-) diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c index 32034543f49c..b696eabccf60 100644 --- a/arch/arm/mm/fault.c +++ b/arch/arm/mm/fault.c @@ -201,9 +201,6 @@ void do_bad_area(unsigned long addr, unsigned int fsr, struct pt_regs *regs) } #ifdef CONFIG_MMU -#define VM_FAULT_BADMAP0x01 -#define VM_FAULT_BADACCESS 0x02 - /* * Check that the permissions on the VMA allow for the fault which occurred. * If we encountered a write fault, we must have write permission, otherwise diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index 91c53a7d2575..3d0b1f8eacce 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -318,9 +318,6 @@ static void do_bad_area(unsigned long addr, unsigned int esr, struct pt_regs *re } } -#define VM_FAULT_BADMAP0x01 -#define VM_FAULT_BADACCESS 0x02 - static int __do_page_fault(struct mm_struct *mm, unsigned long addr, unsigned int mm_flags, unsigned long vm_flags, struct task_struct *tsk) diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c index e074480d3598..48c781ae25d0 100644 --- a/arch/s390/mm/fault.c +++ b/arch/s390/mm/fault.c @@ -44,12 +44,6 @@ #define __SUBCODE_MASK 0x0600 #define __PF_RES_FIELD 0x8000ULL -#define VM_FAULT_BADCONTEXT0x01 -#define VM_FAULT_BADMAP0x02 -#define VM_FAULT_BADACCESS 0x04 -#define VM_FAULT_SIGNAL0x08 -#define VM_FAULT_PFAULT0x10 - enum fault_type { KERNEL_FAULT, USER_FAULT, diff --git a/arch/unicore32/mm/fault.c b/arch/unicore32/mm/fault.c index 381473412937..6c3c1a82925f 100644 --- a/arch/unicore32/mm/fault.c +++ b/arch/unicore32/mm/fault.c @@ -148,9 +148,6 @@ void do_bad_area(unsigned long addr, unsigned int fsr, struct pt_regs *regs) __do_kernel_fault(mm, addr, fsr, regs); } -#define VM_FAULT_BADMAP0x01 -#define VM_FAULT_BADACCESS 0x02 - /* * Check that the permissions on the VMA allow for the fault which occurred. * If we encountered a write fault, we must have write permission, otherwise diff --git a/include/linux/mm.h b/include/linux/mm.h index 338b8a1afb02..64d09e3afc24 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1250,6 +1250,13 @@ static inline void clear_page_pfmemalloc(struct page *page) * and needs fsync() to complete (for * synchronous page faults in DAX) */ +/* Only for use in architecture specific page fault handling: */ +#define VM_FAULT_BADMAP0x01 +#define VM_FAULT_BADACCESS 0x02 +#define VM_FAULT_BADCONTEXT0x04 +#define VM_FAULT_SIGNAL0x08 +#define VM_FAULT_PFAULT0x10 + #define VM_FAULT_ERROR (VM_FAULT_OOM | VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV | \ VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_LARGE | \ VM_FAULT_FALLBACK) -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 08/14] ocfs2: separate errno from VM_FAULT_* values
Signed-off-by: Christoph Hellwig <h...@lst.de> --- fs/ocfs2/mmap.c | 36 +++- 1 file changed, 19 insertions(+), 17 deletions(-) diff --git a/fs/ocfs2/mmap.c b/fs/ocfs2/mmap.c index fb9a20e3d608..e75c1fc5333e 100644 --- a/fs/ocfs2/mmap.c +++ b/fs/ocfs2/mmap.c @@ -44,11 +44,11 @@ #include "ocfs2_trace.h" -static int ocfs2_fault(struct vm_fault *vmf) +static vm_fault_t ocfs2_fault(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; sigset_t oldset; - int ret; + vm_fault_t ret; ocfs2_block_signals(); ret = filemap_fault(vmf); @@ -59,10 +59,10 @@ static int ocfs2_fault(struct vm_fault *vmf) return ret; } -static int __ocfs2_page_mkwrite(struct file *file, struct buffer_head *di_bh, - struct page *page) +static vm_fault_t __ocfs2_page_mkwrite(struct file *file, + struct buffer_head *di_bh, struct page *page) { - int ret = VM_FAULT_NOPAGE; + vm_fault_t ret = VM_FAULT_NOPAGE; struct inode *inode = file_inode(file); struct address_space *mapping = inode->i_mapping; loff_t pos = page_offset(page); @@ -71,6 +71,7 @@ static int __ocfs2_page_mkwrite(struct file *file, struct buffer_head *di_bh, struct page *locked_page = NULL; void *fsdata; loff_t size = i_size_read(inode); + int err; last_index = (size - 1) >> PAGE_SHIFT; @@ -105,12 +106,12 @@ static int __ocfs2_page_mkwrite(struct file *file, struct buffer_head *di_bh, if (page->index == last_index) len = ((size - 1) & ~PAGE_MASK) + 1; - ret = ocfs2_write_begin_nolock(mapping, pos, len, OCFS2_WRITE_MMAP, + err = ocfs2_write_begin_nolock(mapping, pos, len, OCFS2_WRITE_MMAP, _page, , di_bh, page); - if (ret) { - if (ret != -ENOSPC) - mlog_errno(ret); - if (ret == -ENOMEM) + if (err) { + if (err != -ENOSPC) + mlog_errno(err); + if (err == -ENOMEM) ret = VM_FAULT_OOM; else ret = VM_FAULT_SIGBUS; @@ -121,20 +122,21 @@ static int __ocfs2_page_mkwrite(struct file *file, struct buffer_head *di_bh, ret = VM_FAULT_NOPAGE; goto out; } - ret = ocfs2_write_end_nolock(mapping, pos, len, len, fsdata); - BUG_ON(ret != len); + err = ocfs2_write_end_nolock(mapping, pos, len, len, fsdata); + BUG_ON(err != len); ret = VM_FAULT_LOCKED; out: return ret; } -static int ocfs2_page_mkwrite(struct vm_fault *vmf) +static vm_fault_t ocfs2_page_mkwrite(struct vm_fault *vmf) { struct page *page = vmf->page; struct inode *inode = file_inode(vmf->vma->vm_file); struct buffer_head *di_bh = NULL; sigset_t oldset; - int ret; + vm_fault_t ret = 0; + int err; sb_start_pagefault(inode->i_sb); ocfs2_block_signals(); @@ -144,10 +146,10 @@ static int ocfs2_page_mkwrite(struct vm_fault *vmf) * node. Taking the data lock will also ensure that we don't * attempt page truncation as part of a downconvert. */ - ret = ocfs2_inode_lock(inode, _bh, 1); - if (ret < 0) { + err = ocfs2_inode_lock(inode, _bh, 1); + if (err < 0) { mlog_errno(ret); - if (ret == -ENOMEM) + if (err == -ENOMEM) ret = VM_FAULT_OOM; else ret = VM_FAULT_SIGBUS; -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 14/14] mm: turn on vm_fault_t type checking
Switch vm_fault_t to point to an unsigned int with __bіtwise annotations. This both catches any old ->fault or ->page_mkwrite instance with plain compiler type checking, as well as finding more intricate problems with sparse. Signed-off-by: Christoph Hellwig <h...@lst.de> --- arch/alpha/mm/fault.c | 2 +- arch/arc/mm/fault.c | 3 +- arch/arm/mm/fault.c | 5 +- arch/arm64/mm/fault.c | 7 +- arch/hexagon/mm/vm_fault.c| 2 +- arch/ia64/mm/fault.c | 2 +- arch/m68k/mm/fault.c | 2 +- arch/microblaze/mm/fault.c| 2 +- arch/mips/mm/fault.c | 2 +- arch/nds32/mm/fault.c | 2 +- arch/nios2/mm/fault.c | 2 +- arch/openrisc/mm/fault.c | 2 +- arch/parisc/mm/fault.c| 2 +- arch/powerpc/include/asm/copro.h | 2 +- arch/powerpc/mm/copro_fault.c | 2 +- arch/powerpc/mm/fault.c | 10 +-- arch/powerpc/platforms/cell/spufs/fault.c | 2 +- arch/riscv/mm/fault.c | 3 +- arch/s390/kernel/vdso.c | 2 +- arch/s390/mm/fault.c | 2 +- arch/sh/mm/fault.c| 2 +- arch/sparc/mm/fault_32.c | 4 +- arch/sparc/mm/fault_64.c | 3 +- arch/um/kernel/trap.c | 2 +- arch/unicore32/mm/fault.c | 10 +-- arch/x86/entry/vdso/vma.c | 4 +- arch/x86/mm/fault.c | 11 +-- arch/xtensa/mm/fault.c| 2 +- drivers/dax/device.c | 21 +++--- drivers/gpu/drm/drm_vm.c | 10 +-- drivers/gpu/drm/etnaviv/etnaviv_drv.h | 2 +- drivers/gpu/drm/etnaviv/etnaviv_gem.c | 2 +- drivers/gpu/drm/exynos/exynos_drm_gem.c | 2 +- drivers/gpu/drm/exynos/exynos_drm_gem.h | 2 +- drivers/gpu/drm/gma500/framebuffer.c | 6 +- drivers/gpu/drm/gma500/gem.c | 2 +- drivers/gpu/drm/gma500/psb_drv.h | 2 +- drivers/gpu/drm/i915/i915_drv.h | 2 +- drivers/gpu/drm/i915/i915_gem.c | 21 ++ drivers/gpu/drm/msm/msm_drv.h | 2 +- drivers/gpu/drm/msm/msm_gem.c | 2 +- drivers/gpu/drm/qxl/qxl_ttm.c | 4 +- drivers/gpu/drm/radeon/radeon_ttm.c | 2 +- drivers/gpu/drm/udl/udl_drv.h | 2 +- drivers/gpu/drm/udl/udl_gem.c | 2 +- drivers/gpu/drm/vc4/vc4_bo.c | 2 +- drivers/gpu/drm/vc4/vc4_drv.h | 2 +- drivers/hwtracing/intel_th/msu.c | 2 +- drivers/iommu/amd_iommu_v2.c | 2 +- drivers/iommu/intel-svm.c | 3 +- drivers/misc/cxl/fault.c | 2 +- drivers/misc/ocxl/context.c | 6 +- drivers/misc/ocxl/link.c | 2 +- drivers/misc/ocxl/sysfs.c | 2 +- drivers/scsi/cxlflash/superpipe.c | 4 +- drivers/staging/ncpfs/mmap.c | 2 +- drivers/xen/privcmd.c | 2 +- fs/9p/vfs_file.c | 2 +- fs/afs/internal.h | 2 +- fs/afs/write.c| 2 +- fs/f2fs/file.c| 10 +-- fs/fuse/file.c| 2 +- fs/gfs2/file.c| 2 +- fs/iomap.c| 2 +- fs/nfs/file.c | 4 +- fs/nilfs2/file.c | 2 +- fs/proc/vmcore.c | 2 +- fs/userfaultfd.c | 4 +- fs/xfs/xfs_file.c | 12 ++-- include/linux/huge_mm.h | 13 ++-- include/linux/hugetlb.h | 2 +- include/linux/iomap.h | 4 +- include/linux/mm.h| 67 + include/linux/mm_types.h | 5 +- include/linux/oom.h | 2 +- include/linux/swapops.h | 4 +- include/linux/userfaultfd_k.h | 5 +- ipc/shm.c | 2 +- kernel/events/core.c | 4 +- mm/gup.c | 7 +- mm/hmm.c | 2 +- mm/huge_memory.c | 29 mm/hugetlb.c | 25 +++ mm/internal.h | 2 +- mm/khugepaged.c | 3 +- mm/ksm.c | 2 +- mm/memory.c | 88 --- mm/mmap.c | 4 +- mm/shmem.c| 9 +-- samples/vfio-mdev/mbochs.c| 4 +- virt/kvm/kvm_main.c | 2
Re: btrfs space used issue
On Sat, Mar 03, 2018 at 06:59:26AM +, Duncan wrote: > Indeed. Preallocation with COW doesn't make the sense it does on an > overwrite-in-place filesystem. It makes a whole lot of sense, it just is a little harder to implement. There is no reason not to preallocate specific space, or if you aren't forced to be fully log structured by the medium, specific blocks to COW into. It just isn't quite as trivial as for a rewrite in place file system to implement. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/7] blk-mq: make blk_abort_request() trigger timeout path
On Sat, Dec 16, 2017 at 04:07:23AM -0800, Tejun Heo wrote: > Note that this makes blk_abort_request() asynchronous - it initiates > abortion but the actual termination will happen after a short while, > even when the caller owns the request. AFAICS, SCSI and ATA should be > fine with that and I think mtip32xx and dasd should be safe but not > completely sure. It'd be great if people who know the drivers take a > look. For that you'll need to CC linux-ide and linux-scsi, and for the SAS drivers some of the usual suspects that touch the SAS code. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/7] blk-mq: protect completion path with RCU
Why do you need the srcu protection? The completion path can never sleep. If there is a good reason to keep it please add commment, and make the srcu variant a separate function only used by drivers that need it instead of adding the conditional. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHSET v3] blk-mq: reimplement timeout handling
This seems to miss the linux-block list once again. Please include it in the next resend. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu-kvm VM died during partial raid1 problems of btrfs
On Tue, Sep 12, 2017 at 08:43:59PM +0200, Adam Borowski wrote: > For now, though, I wonder -- should we send fine folks at util-linux a patch > to make fallocate -d restore mtime, either always or on an option? Don't do that. Please just add a new ioctl or fallocate command that punches a hole if the range is zeroed, similar to what dedup does. It can probably even reuse a few helpers. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: do not reset bio->bi_ops while writing bio
On Thu, Aug 24, 2017 at 11:04:40AM -0700, Liu Bo wrote: > On Thu, Aug 24, 2017 at 07:10:41AM -0700, Christoph Hellwig wrote: > > On Wed, Aug 23, 2017 at 12:15:09PM -0600, Liu Bo wrote: > > > flush_epd_write_bio() sets bio->bi_ops by itself to honor REQ_SYNC, > > > but it's not needed at all since bio->bi_ops has set up properly in > > > both __extent_writepage() and write_one_eb(), and in the case of > > > write_one_eb(), it also sets REQ_META, which we will lose in > > > flush_epd_write_bio(). > > > > > > This remove this unnecessary bio->bi_ops setting. > > > > Nitpick: it's bi_op, not bi_ops. > > Ah thanks, it's bio->bi_opf ;) Heh. Of course.. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: do not reset bio->bi_ops while writing bio
On Wed, Aug 23, 2017 at 12:15:09PM -0600, Liu Bo wrote: > flush_epd_write_bio() sets bio->bi_ops by itself to honor REQ_SYNC, > but it's not needed at all since bio->bi_ops has set up properly in > both __extent_writepage() and write_one_eb(), and in the case of > write_one_eb(), it also sets REQ_META, which we will lose in > flush_epd_write_bio(). > > This remove this unnecessary bio->bi_ops setting. Nitpick: it's bi_op, not bi_ops. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/6] raid5: remove a call to get_start_sect
On Wed, Aug 23, 2017 at 11:23:38AM -0700, Shaohua Li wrote: > On Wed, Aug 23, 2017 at 07:10:28PM +0200, Christoph Hellwig wrote: > > The block layer always remaps partitions before calling into the > > ->make_request methods of drivers. Thus the call to get_start_sect in > > in_chunk_boundary will always return 0 and can be removed. > > > > Signed-off-by: Christoph Hellwig <h...@lst.de> > > --- > > drivers/md/raid5.c | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > > index 0fc2748aaf95..d687aeb1b538 100644 > > --- a/drivers/md/raid5.c > > +++ b/drivers/md/raid5.c > > @@ -5092,10 +5092,12 @@ static int raid5_congested(struct mddev *mddev, int > > bits) > > static int in_chunk_boundary(struct mddev *mddev, struct bio *bio) > > { > > struct r5conf *conf = mddev->private; > > - sector_t sector = bio->bi_iter.bi_sector + get_start_sect(bio->bi_bdev); > > + sector_t sector = bio->bi_iter.bi_sector; > > unsigned int chunk_sectors; > > unsigned int bio_sectors = bio_sectors(bio); > > > > + WARN_ON_ONCE(bio->bi_partno); > > + Meh, of course bi_partno is only added a few patches later, so this breaks bisectability. But given that the patch is already in there's probably nothing I can do here. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: don't require a struct block_device to submit a bio
Patch 6 didn't seem to make it through to the list, so here is a link to the git tree: http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/bi_bdev -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/6] block: add a __disk_get_part helper
This helper allows looking up a partion under RCU protection without grabbing a reference to it. Signed-off-by: Christoph Hellwig <h...@lst.de> --- block/blk.h | 2 ++ block/genhd.c | 26 +- 2 files changed, 15 insertions(+), 13 deletions(-) diff --git a/block/blk.h b/block/blk.h index 3a3d715bd725..fde8b351c166 100644 --- a/block/blk.h +++ b/block/blk.h @@ -204,6 +204,8 @@ static inline void elv_deactivate_rq(struct request_queue *q, struct request *rq e->type->ops.sq.elevator_deactivate_req_fn(q, rq); } +struct hd_struct *__disk_get_part(struct gendisk *disk, int partno); + #ifdef CONFIG_FAIL_IO_TIMEOUT int blk_should_fake_timeout(struct request_queue *); ssize_t part_timeout_show(struct device *, struct device_attribute *, char *); diff --git a/block/genhd.c b/block/genhd.c index 3380a1e73ea0..713b7d4fe7a1 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -82,6 +82,15 @@ void part_in_flight(struct request_queue *q, struct hd_struct *part, } } +struct hd_struct *__disk_get_part(struct gendisk *disk, int partno) +{ + struct disk_part_tbl *ptbl = rcu_dereference(disk->part_tbl); + + if (unlikely(partno < 0 || partno >= ptbl->len)) + return NULL; + return rcu_dereference(ptbl->part[partno]); +} + /** * disk_get_part - get partition * @disk: disk to look partition from @@ -98,21 +107,12 @@ void part_in_flight(struct request_queue *q, struct hd_struct *part, */ struct hd_struct *disk_get_part(struct gendisk *disk, int partno) { - struct hd_struct *part = NULL; - struct disk_part_tbl *ptbl; - - if (unlikely(partno < 0)) - return NULL; + struct hd_struct *part; rcu_read_lock(); - - ptbl = rcu_dereference(disk->part_tbl); - if (likely(partno < ptbl->len)) { - part = rcu_dereference(ptbl->part[partno]); - if (part) - get_device(part_to_dev(part)); - } - + part = __disk_get_part(disk, partno); + if (part) + get_device(part_to_dev(part)); rcu_read_unlock(); return part; -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/6] raid5: remove a call to get_start_sect
The block layer always remaps partitions before calling into the ->make_request methods of drivers. Thus the call to get_start_sect in in_chunk_boundary will always return 0 and can be removed. Signed-off-by: Christoph Hellwig <h...@lst.de> --- drivers/md/raid5.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 0fc2748aaf95..d687aeb1b538 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -5092,10 +5092,12 @@ static int raid5_congested(struct mddev *mddev, int bits) static int in_chunk_boundary(struct mddev *mddev, struct bio *bio) { struct r5conf *conf = mddev->private; - sector_t sector = bio->bi_iter.bi_sector + get_start_sect(bio->bi_bdev); + sector_t sector = bio->bi_iter.bi_sector; unsigned int chunk_sectors; unsigned int bio_sectors = bio_sectors(bio); + WARN_ON_ONCE(bio->bi_partno); + chunk_sectors = min(conf->chunk_sectors, conf->prev_chunk_sectors); return chunk_sectors >= ((sector & (chunk_sectors - 1)) + bio_sectors); -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/6] block: cache the partition index in struct block_device
Signed-off-by: Christoph Hellwig <h...@lst.de> --- fs/block_dev.c | 1 + include/linux/fs.h | 1 + 2 files changed, 2 insertions(+) diff --git a/fs/block_dev.c b/fs/block_dev.c index 9941dc8342df..d29d1c70f833 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -1451,6 +1451,7 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part) bdev->bd_disk = disk; bdev->bd_queue = disk->queue; bdev->bd_contains = bdev; + bdev->bd_partno = partno; if (!partno) { ret = -ENXIO; diff --git a/include/linux/fs.h b/include/linux/fs.h index 6e1fd5d21248..706dd3a972d2 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -427,6 +427,7 @@ struct block_device { #endif struct block_device * bd_contains; unsignedbd_block_size; + u8 bd_partno; struct hd_struct * bd_part; /* number of times partitions within this device have been opened. */ unsignedbd_part_count; -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
don't require a struct block_device to submit a bio
Hi Jens, this series removes the need to have a struct block_device ready to submit a bio. We basically don't use it anywhere in the block stack anyway - we always go for the gendisk or request_queue. The only exception is partition remapping for which we'll now need an additional partition index. This helps with cases where we submit I/O from a character device (nvme or lightnvm passthrough) or a different block device (upcoming nvme multipath support). -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/6] btrfs: index check-integrity state hash by a dev_t
We won't have the struct block_device available in the bio soon, so switch to the numerical dev_t instead of the block_device pointer for looking up the check-integrity state. Signed-off-by: Christoph Hellwig <h...@lst.de> --- fs/btrfs/check-integrity.c | 31 +-- 1 file changed, 13 insertions(+), 18 deletions(-) diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c index 11d37c94ce05..9d3854839038 100644 --- a/fs/btrfs/check-integrity.c +++ b/fs/btrfs/check-integrity.c @@ -296,8 +296,7 @@ static void btrfsic_dev_state_hashtable_add( struct btrfsic_dev_state *ds, struct btrfsic_dev_state_hashtable *h); static void btrfsic_dev_state_hashtable_remove(struct btrfsic_dev_state *ds); -static struct btrfsic_dev_state *btrfsic_dev_state_hashtable_lookup( - struct block_device *bdev, +static struct btrfsic_dev_state *btrfsic_dev_state_hashtable_lookup(dev_t dev, struct btrfsic_dev_state_hashtable *h); static struct btrfsic_stack_frame *btrfsic_stack_frame_alloc(void); static void btrfsic_stack_frame_free(struct btrfsic_stack_frame *sf); @@ -385,8 +384,7 @@ static int btrfsic_process_superblock_dev_mirror( int superblock_mirror_num, struct btrfsic_dev_state **selected_dev_state, struct btrfs_super_block *selected_super); -static struct btrfsic_dev_state *btrfsic_dev_state_lookup( - struct block_device *bdev); +static struct btrfsic_dev_state *btrfsic_dev_state_lookup(dev_t dev); static void btrfsic_cmp_log_and_dev_bytenr(struct btrfsic_state *state, u64 bytenr, struct btrfsic_dev_state *dev_state, @@ -626,17 +624,15 @@ static void btrfsic_dev_state_hashtable_remove(struct btrfsic_dev_state *ds) list_del(>collision_resolving_node); } -static struct btrfsic_dev_state *btrfsic_dev_state_hashtable_lookup( - struct block_device *bdev, +static struct btrfsic_dev_state *btrfsic_dev_state_hashtable_lookup(dev_t dev, struct btrfsic_dev_state_hashtable *h) { const unsigned int hashval = - (((unsigned int)((uintptr_t)bdev)) & -(BTRFSIC_DEV2STATE_HASHTABLE_SIZE - 1)); + dev & (BTRFSIC_DEV2STATE_HASHTABLE_SIZE - 1); struct btrfsic_dev_state *ds; list_for_each_entry(ds, h->table + hashval, collision_resolving_node) { - if (ds->bdev == bdev) + if (ds->bdev->bd_dev == dev) return ds; } @@ -668,7 +664,7 @@ static int btrfsic_process_superblock(struct btrfsic_state *state, if (!device->bdev || !device->name) continue; - dev_state = btrfsic_dev_state_lookup(device->bdev); + dev_state = btrfsic_dev_state_lookup(device->bdev->bd_dev); BUG_ON(NULL == dev_state); for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { ret = btrfsic_process_superblock_dev_mirror( @@ -1556,7 +1552,7 @@ static int btrfsic_map_block(struct btrfsic_state *state, u64 bytenr, u32 len, } device = multi->stripes[0].dev; - block_ctx_out->dev = btrfsic_dev_state_lookup(device->bdev); + block_ctx_out->dev = btrfsic_dev_state_lookup(device->bdev->bd_dev); block_ctx_out->dev_bytenr = multi->stripes[0].physical; block_ctx_out->start = bytenr; block_ctx_out->len = len; @@ -2654,7 +2650,7 @@ static struct btrfsic_block *btrfsic_block_lookup_or_add( pr_info("btrfsic: error, kmalloc failed!\n"); return NULL; } - dev_state = btrfsic_dev_state_lookup(block_ctx->dev->bdev); + dev_state = btrfsic_dev_state_lookup(block_ctx->dev->bdev->bd_dev); if (NULL == dev_state) { pr_info("btrfsic: error, lookup dev_state failed!\n"); btrfsic_block_free(block); @@ -2734,10 +2730,9 @@ static void btrfsic_cmp_log_and_dev_bytenr(struct btrfsic_state *state, } } -static struct btrfsic_dev_state *btrfsic_dev_state_lookup( - struct block_device *bdev) +static struct btrfsic_dev_state *btrfsic_dev_state_lookup(dev_t dev) { - return btrfsic_dev_state_hashtable_lookup(bdev, + return btrfsic_dev_state_hashtable_lookup(dev, _dev_state_hashtable); } @@ -2751,7 +2746,7 @@ int btrfsic_submit_bh(int op, int op_flags, struct buffer_head *bh) mutex_lock(_mutex); /* since btrfsic_submit_bh() might also be called before * btrfsic_mount(), this might return NULL */ - dev_state = btrfsic_dev_state_lookup(bh->b_bdev); + dev_stat
[PATCH 3/6] block: reject attempts to allocate more than DISK_MAX_PARTS partitions
Signed-off-by: Christoph Hellwig <h...@lst.de> --- block/genhd.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/block/genhd.c b/block/genhd.c index 2367087cdb7c..3380a1e73ea0 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -1357,6 +1357,13 @@ struct gendisk *alloc_disk_node(int minors, int node_id) struct gendisk *disk; struct disk_part_tbl *ptbl; + if (minors > DISK_MAX_PARTS) { + printk(KERN_ERR + "block: can't allocated more than %d partitions\n", + DISK_MAX_PARTS); + minors = DISK_MAX_PARTS; + } + disk = kzalloc_node(sizeof(struct gendisk), GFP_KERNEL, node_id); if (disk) { if (!init_part_stats(>part0)) { -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
On Sat, Aug 12, 2017 at 02:10:18AM +0200, Christoph Anton Mitterer wrote: > Qu Wenruo wrote: > >Although Btrfs can disable data CoW, nodatacow also disables data > >checksum, which is another main feature for btrfs. > > Then decoupling of the two should probably decoupled and support for > notdatacow+checksumming be implemented?! And how are you going to write your data and checksum atomically when doing in-place updates? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 11/49] btrfs: avoid access to .bi_vcnt directly
> +static unsigned int get_bio_pages(struct bio *bio) > +{ > + unsigned i; > + struct bio_vec *bv; > + > + bio_for_each_segment_all(bv, bio, i) > + ; > + > + return i; > +} s/get_bio_pages/bio_nr_pages/ ? Also this seems like a useful helper for bio.h -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Remove extra parentheses from condition in copy_items()
> - if ((i == (nr - 1))) > + if (i == (nr - 1)) Please remove the other set of redundant parentheses as well if you change that line. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Btrfs for 4.13
On Wed, Jul 05, 2017 at 04:45:49PM -0700, Linus Torvalds wrote: > Ouch, yeah, that was annoying. And yes, I ended up with several > whitespace differences but other than that it looks the same. Oh well. > > Jens, Christoph, let's not do that stupid thing with status-vs-error > ever again, ok? It should be done now :) Back to serious - this work had to be done as we could not communicate errors properly in the block layer with people inventing their own codes all the time. That being said, in retrospective I should have tried a gradual approach that first defines the new code to the old errors and left the fields in place. I opted for the hard break to make sure we can catch matching conversions by the compiler as very few people run sparse for the __bitwise annotations, and some of the drivers create so many sparse warnings that they would be lost anyway. But I've learned and will try to make something like this gradual if it comes up, and hope we won't have to do something like it again at all. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [proposal] making filesystem tools more machine friendly
Hi Jan, I really like the idea out json output for fs/block tools. nvme-cli has been doing it for a while, and that has been very useful. It comes with it's own little handrolled json output helpers: https://github.com/linux-nvme/nvme-cli/blob/master/json.c so it doesn't introduce any new dependencies. If you want to parse input as well a library might be a better approach, though. I really don't like the shared object evms model mention in the thread, as that creates hard ELF dependencies. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 17/18] xfs: minimal conversion to errseq_t writeback error reporting
On Fri, Jun 30, 2017 at 12:45:54PM -0400, Jeff Layton wrote: > Should I aim to do that with an individual patch for each fs, or is it > better to do a swath of them all at once in a single patch here? I'd be perfectly happy with one big patch for all the trivial conversions. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 03/18] fs: check for writeback errors after syncing out buffers in generic_file_fsync
On Thu, Jun 29, 2017 at 09:19:39AM -0400, jlay...@kernel.org wrote: > From: Jeff Layton> > ext2 currently does a test+clear of the AS_EIO flag, which is > is problematic for some coming changes. > > What we really need to do instead is call filemap_check_errors > in __generic_file_fsync after syncing out the buffers. That > will be sufficient for this case, and help other callers detect > these errors properly as well. > > With that, we don't need to twiddle it in ext2. Seems like much of this code is getting replaced later in the series.. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 14/18] block: convert to errseq_t based writeback error tracking
Looks good, Reviewed-by: Christoph Hellwig <h...@lst.de> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 18/18] btrfs: minimal conversion to errseq_t writeback error reporting on fsync
On Thu, Jun 29, 2017 at 09:19:54AM -0400, jlay...@kernel.org wrote: > From: Jeff Layton> > Just check and advance the errseq_t in the file before returning. > Internal callers of filemap_* functions are left as-is. > > Signed-off-by: Jeff Layton > --- > fs/btrfs/file.c | 7 +-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c > index da1096eb1a40..1f57e1a523d9 100644 > --- a/fs/btrfs/file.c > +++ b/fs/btrfs/file.c > @@ -2011,7 +2011,7 @@ int btrfs_sync_file(struct file *file, loff_t start, > loff_t end, int datasync) > struct btrfs_root *root = BTRFS_I(inode)->root; > struct btrfs_trans_handle *trans; > struct btrfs_log_ctx ctx; > - int ret = 0; > + int ret = 0, err; > bool full_sync = 0; > u64 len; > > @@ -2030,7 +2030,7 @@ int btrfs_sync_file(struct file *file, loff_t start, > loff_t end, int datasync) >*/ > ret = start_ordered_ops(inode, start, end); > if (ret) > - return ret; > + goto out; > > inode_lock(inode); > atomic_inc(>log_batch); > @@ -2227,6 +2227,9 @@ int btrfs_sync_file(struct file *file, loff_t start, > loff_t end, int datasync) > ret = btrfs_end_transaction(trans); > } > out: > + err = file_check_and_advance_wb_err(file); > + if (!ret) > + ret = err; > return ret > 0 ? -EIO : ret; This means that we'll lose the exact error returned from start_ordered_ops. Beyond that I can't really provide good feedback as the btrfs fsync code looks so much different from all the other fs fsync code.. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 17/18] xfs: minimal conversion to errseq_t writeback error reporting
Nice and simple, this looks great! Reviewed-by: Christoph Hellwig <h...@lst.de> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 16/18] ext4: use errseq_t based error handling for reporting data writeback errors
> - if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb > - return -EIO; > + if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb { > + ret = -EIO; > + goto out; > + } This just seems to add a call to trace_ext4_sync_file_exit for this case, which seems unrelated to the patch. > if (ret) > - return ret; > + goto out; > + Same here. > /* >* data=writeback,ordered: >* The caller's filemap_fdatawrite()/wait will sync the data. > @@ -152,7 +155,7 @@ int ext4_sync_file(struct file *file, loff_t start, > loff_t end, int datasync) > needs_barrier = true; > ret = jbd2_complete_transaction(journal, commit_tid); > if (needs_barrier) { > - issue_flush: > +issue_flush: > err = blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL); And while I much prefer your new label placement it also doesn't seem to belong into this patch. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 16/22] block: convert to errseq_t based writeback error tracking
On Mon, Jun 26, 2017 at 10:34:18AM -0400, Jeff Layton wrote: > The bigger question is -- what about more complex filesystems like > ext4? There are a couple of cases where we can return -EIO or -EROFS on > fsync before filemap_write_and_wait_range is ever called. Like this one > for instance: > > if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb > return -EIO; > > ...and the EXT4_MF_FS_ABORTED case. > > Are those conditions ever recoverable, such that a later fsync could > succeed? IOW, could I do a remount or something such that the existing > fds are left open and become usable again? This looks copied from the xfs forced shutdown code, and in that case it's final and permanent - you'll need an unmount to clear it. > If so, then we really ought to advance the errseq_t in the file when we > catch those cases as well. If we have to do that, then it probably makes > sense to leave the ext4 patch as-is. I think it can switch to the new file helper. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 16/22] block: convert to errseq_t based writeback error tracking
On Tue, Jun 20, 2017 at 01:44:44PM -0400, Jeff Layton wrote: > In order to query for errors with errseq_t, you need a previously- > sampled point from which to check. When you call > filemap_write_and_wait_range though you don't have a struct file and so > no previously-sampled value. So can we simply introduce variants of them that take a struct file? That would be: a) less churn b) less code c) less chance to get data integrity wrong -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 16/22] block: convert to errseq_t based writeback error tracking
> error = filemap_write_and_wait_range(filp->f_mapping, start, end); > if (error) > - return error; > + goto out; > > /* >* There is no need to serialise calls to blkdev_issue_flush with > @@ -640,6 +640,10 @@ int blkdev_fsync(struct file *filp, loff_t start, loff_t > end, int datasync) > if (error == -EOPNOTSUPP) > error = 0; > > +out: > + wberr = filemap_report_wb_err(filp); > + if (!error) > + error = wberr; Just curious: what's the reason filemap_write_and_wait_range couldn't query for the error using filemap_report_wb_err internally? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 11/22] fs: new infrastructure for writeback error handling and reporting
> @@ -393,6 +394,7 @@ struct address_space { > gfp_t gfp_mask; /* implicit gfp mask for > allocations */ > struct list_headprivate_list; /* ditto */ > void*private_data; /* ditto */ > + errseq_twb_err; > } __attribute__((aligned(sizeof(long; > /* >* On most architectures that alignment is already the case; but > @@ -847,6 +849,7 @@ struct file { >* Must not be taken from IRQ context. >*/ > spinlock_t f_lock; > + errseq_tf_wb_err; > atomic_long_t f_count; > unsigned intf_flags; > fmode_t f_mode; Did you check the sizes of the structure before and after? These places don't look like holes in the packing, but there probably are some available. > +static inline int filemap_check_wb_err(struct address_space *mapping, > errseq_t since) Overly long line here (the patch has a few more) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 01/22] fs: remove call_fsync helper function
On Fri, Jun 16, 2017 at 03:34:06PM -0400, Jeff Layton wrote: > Requested-by: Christoph Hellwig <h...@infradead.org> > Signed-off-by: Jeff Layton <jlay...@redhat.com> Looks good, Reviewed-by: Christoph Hellwig <h...@lst.de> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: use new block error code
On Mon, Jun 19, 2017 at 01:55:37PM +0300, Dan Carpenter wrote: > This function is supposed to return blk_status_t error codes now but > there was a stray -ENOMEM left behind. > > Fixes: 4e4cbee93d56 ("block: switch bios to blk_status_t") > Signed-off-by: Dan Carpenter <dan.carpen...@oracle.com> Looks fine, Acked-by: Christoph Hellwig <h...@lst.de> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 12/20] fs: add a new fstype flag to indicate how writeback errors are tracked
On Thu, Jun 15, 2017 at 06:42:12AM -0400, Jeff Layton wrote: > Correct. > > But if there is a data writeback error, should we report an error on all > open fds at that time (like we will for fsync)? We should in theory, but I don't see how to properly do it. In addition sync_file_range just can't be used for data integrity to start with, so I don't think it's worth it. At some point we should add a proper fsync_range syscall, though. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/10 v11] No wait AIO
On Thu, Jun 15, 2017 at 12:11:58PM +0100, Al Viro wrote: > Which flags are you talking about? aio ones? AFAICS, it's the same > kind of thing as "can we lseek?" or "can we read/pread?", etc. > What would that field look like? Note that some of those might depend > upon the flags passed to open(), so shoving them into file_operations > might mean splitting file_operations instances... Details, please. Yeah, I'll take my suggestion back. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 12/20] fs: add a new fstype flag to indicate how writeback errors are tracked
On Wed, Jun 14, 2017 at 01:24:43PM -0400, Jeff Layton wrote: > In this smaller set, it's only really used for DAX. DAX only is implemented by three filesystems, please just fix them up in one go. > sync_file_range: ->fsync isn't called directly there, and I think we > probably want similar semantics to fsync() for it sync_file_range is only supposed to sync data, so it should not call ->fsync. > JBD2: will try to re-set the error after clearing it with > filemap_fdatawait. That's problematic with the new infrastructure so we > need some way to avoid it. JBD2 only has two users, please fix them up in one go. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 12/20] fs: add a new fstype flag to indicate how writeback errors are tracked
On Tue, Jun 13, 2017 at 06:24:32AM -0400, Jeff Layton wrote: > That's definitely what I want for the endgame here. My plan was to add > this flag for now, and then eventually reverse it (or drop it) once all > or most filesystems are converted. > > We can do it that way from the get-go if you like. It'll mean tossing in > a patch add this flag to all filesystems that have an fsync operation > and that use the pagecache, and then gradually remove it from them as we > convert them. > > Which method do you prefer? Please do it from the get-go. Or in fact figure out if we can get away without it entirely. Moving the error reporting into ->fsync should help greatly with that, so what's missing after that? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: use ino_t for inode in tracer
I thought btrfs supports 64-bit inodes, in which case this will truncate the inode number on 32-bit architectures. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/10 v11] No wait AIO
On Mon, Jun 12, 2017 at 05:38:13PM -0500, Goldwyn Rodrigues wrote: > We had FS_NOWAIT in filesystem type flags (in v3), but retracted it > later in v4. A per-fs flag is wrong as file_operation may have different capabilities. > I will work on adding FMODE_AIO_NOWAIT in the meantime. If Al prefers that let's go with it for now. One other thing I'd love is a supported flags field in struct file_operations that we can compare the flags against. Al, does that sound good as well? > > Thanks, > > -- > Goldwyn ---end quoted text--- -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 14/20] dax: set errors in mapping when writeback fails
On Mon, Jun 12, 2017 at 08:23:10AM -0400, Jeff Layton wrote: > For now, only do this when the FS_WB_ERRSEQ flag is set. The > AS_EIO/AS_ENOSPC flags are not currently cleared in the older code when > writeback initiation fails, only when we discover an error after waiting > on writeback to complete, so we only want to do this with errseq_t based > error handling to prevent seeing duplicate errors on fsync. Please make sure this doens't stay conditional by the end of the series. We only have three file systems using dax, and a series should be able to make them agree on a single interface. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 12/20] fs: add a new fstype flag to indicate how writeback errors are tracked
On Mon, Jun 12, 2017 at 08:23:06AM -0400, Jeff Layton wrote: > Add a new FS_WB_ERRSEQ flag to the fstype. Later patches will set and > key off of that to decide what behavior should be used. Please invert this so that only file systems that keep the old semantics need a flag. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 15/20] fs: have call_fsync call filemap_report_wb_err if FS_WB_ERRSEQ is set
On Mon, Jun 12, 2017 at 08:23:11AM -0400, Jeff Layton wrote: > Allow filesystems to opt-in to a final check of wb_err if FS_WB_ERRSEQ > is set. Technically, we could just plumb these calls into all of the > fsync operations, but I think this means less code, changes and churn. Please add it to every fs, that is a consistent with how we handle everything else related to writeback. Oh, and please kill this idiotic call_fsync helper while you're at it. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/10 v11] No wait AIO
As already indicated this whole series looks fine to me. Al: are you going to pick this up? Or Andrew? On Tue, Jun 06, 2017 at 06:19:29AM -0500, Goldwyn Rodrigues wrote: > This series adds nonblocking feature to asynchronous I/O writes. > io_submit() can be delayed because of a number of reason: > - Block allocation for files > - Data writebacks for direct I/O > - Sleeping because of waiting to acquire i_rwsem > - Congested block device > > The goal of the patch series is to return -EAGAIN/-EWOULDBLOCK if > any of these conditions are met. This way userspace can push most > of the write()s to the kernel to the best of its ability to complete > and if it returns -EAGAIN, can defer it to another thread. > > In order to enable this, IOCB_RW_FLAG_NOWAIT is introduced in > uapi/linux/aio_abi.h. If set for aio_rw_flags, it translates to > IOCB_NOWAIT for struct iocb, REQ_NOWAIT for bio.bi_opf and IOMAP_NOWAIT for > iomap. aio_rw_flags is a new flag replacing aio_reserved1. We could > not use aio_flags because it is not currently checked for invalidity > in the kernel. > > This feature is provided for direct I/O of asynchronous I/O only. I have > tested it against xfs, ext4, and btrfs while I intend to add more filesystems. > The nowait feature is for request based devices. In the future, I intend to > add support to stacked devices such as md. > > Applications will have to check supportability > by sending a async direct write and any other error besides -EAGAIN > would mean it is not supported. > > First two patches are prep patches into nowait I/O. > > Changes since v1: > + changed name from _NONBLOCKING to *_NOWAIT > + filemap_range_has_page call moved to closer to (just before) calling > filemap_write_and_wait_range(). > + BIO_NOWAIT limited to get_request() > + XFS fixes > - included reflink > - use of xfs_ilock_nowait() instead of a XFS_IOLOCK_NONBLOCKING flag > - Translate the flag through IOMAP_NOWAIT (iomap) to check for > block allocation for the file. > + ext4 coding style > > Changes since v2: > + Using aio_reserved1 as aio_rw_flags instead of aio_flags > + blk-mq support > + xfs uptodate with kernel and reflink changes > > Changes since v3: > + Added FS_NOWAIT, which is set if the filesystem supports NOWAIT feature. > + Checks in generic_make_request() to make sure BIO_NOWAIT comes in > for async direct writes only. > + Added QUEUE_FLAG_NOWAIT, which is set if the device supports BIO_NOWAIT. > This is added (rather not set) to block devices such as dm/md currently. > > Changes since v4: > + Ported AIO code to use RWF_* flags. Check for RWF_* flags in > generic_file_write_iter(). > + Changed IOCB_RW_FLAGS_NOWAIT to RWF_NOWAIT. > > Changes since v5: > + BIO_NOWAIT to REQ_NOWAIT > + Common helper for RWF flags. > > Changes since v6: > + REQ_NOWAIT will be ignored for request based devices since they > cannot block. So, removed QUEUE_FLAG_NOWAIT since it is not > required in the current implementation. It will be resurrected > when we program for stacked devices. > + changed kiocb_rw_flags() to kiocb_set_rw_flags() in order to accomodate > for errors. Moved checks in the function. > > Changes since v7: > + split patches into prep so the main patches are smaller and easier > to understand > + All patches are reviewed or acked! > > Changes since v8: > + Err out AIO reads with -EINVAL flagged as RWF_NOWAIT > > Changes since v9: > + Retract - Err out AIO reads with -EINVAL flagged as RWF_NOWAIT > + XFS returns EAGAIN if extent list is not in memory > + Man page updates to io_submit with iocb description and nowait features. > > Changes since v10: > + Corrected comment and subject in "return on congested block device" > > -- > Goldwyn > > ---end quoted text--- -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/8] btrfs: bioset allocations will never fail, adapt our helpers
On Tue, Jun 06, 2017 at 06:21:17PM +0800, Anand Jain wrote: > > > On 06/03/17 00:58, David Sterba wrote: >> Christoph pointed out that bio allocations backed by a bioset will never >> fail. > > David, > > Looks like this feature comes when __GFP_DIRECT_RECLAIM is > set and we aren't, such as [1]. Any idea why? Looks like I am > missing something ? > > > [1] > - > static int submit_extent_page(int op, int op_flags, struct extent_io_tree > *tree, > :: > > bio = btrfs_bio_alloc(bdev, sector, BIO_MAX_PAGES, > GFP_NOFS | __GFP_HIGH); #define __GFP_RECLAIM ((__force gfp_t)(___GFP_DIRECT_RECLAIM|___GFP_KSWAPD_RECLAIM)) #define GFP_NOFS(__GFP_RECLAIM | __GFP_IO) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 06/13] fs: simplify dio_bio_complete
Only read bio->bi_error once in the common path. Signed-off-by: Christoph Hellwig <h...@lst.de> Reviewed-by: Bart Van Assche <bart.vanass...@sandisk.com> --- fs/direct-io.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/direct-io.c b/fs/direct-io.c index 04247a6c3f73..bb711e4b86c2 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -477,13 +477,12 @@ static int dio_bio_complete(struct dio *dio, struct bio *bio) { struct bio_vec *bvec; unsigned i; - int err; + int err = bio->bi_error; - if (bio->bi_error) + if (err) dio->io_error = -EIO; if (dio->is_async && dio->op == REQ_OP_READ && dio->should_dirty) { - err = bio->bi_error; bio_check_pages_dirty(bio); /* transfers ownership */ } else { bio_for_each_segment_all(bvec, bio, i) { @@ -494,7 +493,6 @@ static int dio_bio_complete(struct dio *dio, struct bio *bio) set_page_dirty_lock(page); put_page(page); } - err = bio->bi_error; bio_put(bio); } return err; -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/13] dm: change ->end_io calling convention
Turn the error paramter into a pointer so that target drivers can change the value, and make sure only DM_ENDIO_* values are returned from the methods. Signed-off-by: Christoph Hellwig <h...@lst.de> --- drivers/md/dm-cache-target.c | 4 ++-- drivers/md/dm-flakey.c| 8 drivers/md/dm-log-writes.c| 4 ++-- drivers/md/dm-mpath.c | 11 ++- drivers/md/dm-raid1.c | 14 +++--- drivers/md/dm-snap.c | 4 ++-- drivers/md/dm-stripe.c| 14 +++--- drivers/md/dm-thin.c | 4 ++-- drivers/md/dm.c | 36 ++-- include/linux/device-mapper.h | 2 +- 10 files changed, 51 insertions(+), 50 deletions(-) diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c index d682a0511381..c48612e6d525 100644 --- a/drivers/md/dm-cache-target.c +++ b/drivers/md/dm-cache-target.c @@ -2820,7 +2820,7 @@ static int cache_map(struct dm_target *ti, struct bio *bio) return r; } -static int cache_end_io(struct dm_target *ti, struct bio *bio, int error) +static int cache_end_io(struct dm_target *ti, struct bio *bio, int *error) { struct cache *cache = ti->private; unsigned long flags; @@ -2838,7 +2838,7 @@ static int cache_end_io(struct dm_target *ti, struct bio *bio, int error) bio_drop_shared_lock(cache, bio); accounted_complete(cache, bio); - return 0; + return DM_ENDIO_DONE; } static int write_dirty_bitset(struct cache *cache) diff --git a/drivers/md/dm-flakey.c b/drivers/md/dm-flakey.c index e8f093b323ce..c9539917a59b 100644 --- a/drivers/md/dm-flakey.c +++ b/drivers/md/dm-flakey.c @@ -358,12 +358,12 @@ static int flakey_map(struct dm_target *ti, struct bio *bio) return DM_MAPIO_REMAPPED; } -static int flakey_end_io(struct dm_target *ti, struct bio *bio, int error) +static int flakey_end_io(struct dm_target *ti, struct bio *bio, int *error) { struct flakey_c *fc = ti->private; struct per_bio_data *pb = dm_per_bio_data(bio, sizeof(struct per_bio_data)); - if (!error && pb->bio_submitted && (bio_data_dir(bio) == READ)) { + if (!*error && pb->bio_submitted && (bio_data_dir(bio) == READ)) { if (fc->corrupt_bio_byte && (fc->corrupt_bio_rw == READ) && all_corrupt_bio_flags_match(bio, fc)) { /* @@ -377,11 +377,11 @@ static int flakey_end_io(struct dm_target *ti, struct bio *bio, int error) * Error read during the down_interval if drop_writes * and error_writes were not configured. */ - return -EIO; + *error = -EIO; } } - return error; + return DM_ENDIO_DONE; } static void flakey_status(struct dm_target *ti, status_type_t type, diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c index e42264706c59..cc57c7fa1268 100644 --- a/drivers/md/dm-log-writes.c +++ b/drivers/md/dm-log-writes.c @@ -664,7 +664,7 @@ static int log_writes_map(struct dm_target *ti, struct bio *bio) return DM_MAPIO_REMAPPED; } -static int normal_end_io(struct dm_target *ti, struct bio *bio, int error) +static int normal_end_io(struct dm_target *ti, struct bio *bio, int *error) { struct log_writes_c *lc = ti->private; struct per_bio_data *pb = dm_per_bio_data(bio, sizeof(struct per_bio_data)); @@ -686,7 +686,7 @@ static int normal_end_io(struct dm_target *ti, struct bio *bio, int error) spin_unlock_irqrestore(>blocks_lock, flags); } - return error; + return DM_ENDIO_DONE; } /* diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c index bf6e49c780d5..ceeeb495d01c 100644 --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -1517,14 +1517,15 @@ static int multipath_end_io(struct dm_target *ti, struct request *clone, return r; } -static int multipath_end_io_bio(struct dm_target *ti, struct bio *clone, int error) +static int multipath_end_io_bio(struct dm_target *ti, struct bio *clone, int *error) { struct multipath *m = ti->private; struct dm_mpath_io *mpio = get_mpio_from_bio(clone); struct pgpath *pgpath = mpio->pgpath; unsigned long flags; + int r = DM_ENDIO_DONE; - if (!error || noretry_error(error)) + if (!*error || noretry_error(*error)) goto done; if (pgpath) @@ -1533,7 +1534,7 @@ static int multipath_end_io_bio(struct dm_target *ti, struct bio *clone, int err if (atomic_read(>nr_valid_paths) == 0 && !test_bit(MPATHF_QUEUE_IF_NO_PATH, >flags)) { dm_report_EIO(m); - error = -EIO; + *error = -EIO; goto done; } @@ -1
[PATCH 12/13] blk-mq: switch ->queue_rq return value to blk_status_t
Use the same values for use for request completion errors as the return value from ->queue_rq. BLK_STS_RESOURCE is special cased to cause a requeue, and all the others are completed as-is. Signed-off-by: Christoph Hellwig <h...@lst.de> --- block/blk-mq.c| 37 -- drivers/block/loop.c | 6 +++--- drivers/block/mtip32xx/mtip32xx.c | 17 drivers/block/nbd.c | 12 --- drivers/block/null_blk.c | 4 ++-- drivers/block/rbd.c | 4 ++-- drivers/block/virtio_blk.c| 10 +- drivers/block/xen-blkfront.c | 8 drivers/md/dm-rq.c| 8 drivers/mtd/ubi/block.c | 6 +++--- drivers/nvme/host/core.c | 14 ++--- drivers/nvme/host/fc.c| 23 +++-- drivers/nvme/host/nvme.h | 2 +- drivers/nvme/host/pci.c | 42 +++ drivers/nvme/host/rdma.c | 26 +--- drivers/nvme/target/loop.c| 17 drivers/scsi/scsi_lib.c | 30 ++-- include/linux/blk-mq.h| 7 ++- 18 files changed, 131 insertions(+), 142 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index adcc1c0dce6e..7af78b1e9db9 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -924,7 +924,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list) { struct blk_mq_hw_ctx *hctx; struct request *rq; - int errors, queued, ret = BLK_MQ_RQ_QUEUE_OK; + int errors, queued; if (list_empty(list)) return false; @@ -935,6 +935,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list) errors = queued = 0; do { struct blk_mq_queue_data bd; + blk_status_t ret; rq = list_first_entry(list, struct request, queuelist); if (!blk_mq_get_driver_tag(rq, , false)) { @@ -975,25 +976,20 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list) } ret = q->mq_ops->queue_rq(hctx, ); - switch (ret) { - case BLK_MQ_RQ_QUEUE_OK: - queued++; - break; - case BLK_MQ_RQ_QUEUE_BUSY: + if (ret == BLK_STS_RESOURCE) { blk_mq_put_driver_tag_hctx(hctx, rq); list_add(>queuelist, list); __blk_mq_requeue_request(rq); break; - default: - pr_err("blk-mq: bad return on queue: %d\n", ret); - case BLK_MQ_RQ_QUEUE_ERROR: + } + + if (unlikely(ret != BLK_STS_OK)) { errors++; blk_mq_end_request(rq, BLK_STS_IOERR); - break; + continue; } - if (ret == BLK_MQ_RQ_QUEUE_BUSY) - break; + queued++; } while (!list_empty(list)); hctx->dispatched[queued_to_index(queued)]++; @@ -1031,7 +1027,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list) * - blk_mq_run_hw_queue() checks whether or not a queue has * been stopped before rerunning a queue. * - Some but not all block drivers stop a queue before -* returning BLK_MQ_RQ_QUEUE_BUSY. Two exceptions are scsi-mq +* returning BLK_STS_RESOURCE. Two exceptions are scsi-mq * and dm-rq. */ if (!blk_mq_sched_needs_restart(hctx) && @@ -1410,7 +1406,7 @@ static void __blk_mq_try_issue_directly(struct request *rq, blk_qc_t *cookie, }; struct blk_mq_hw_ctx *hctx; blk_qc_t new_cookie; - int ret; + blk_status_t ret; if (q->elevator) goto insert; @@ -1426,18 +1422,19 @@ static void __blk_mq_try_issue_directly(struct request *rq, blk_qc_t *cookie, * would have done */ ret = q->mq_ops->queue_rq(hctx, ); - if (ret == BLK_MQ_RQ_QUEUE_OK) { + switch (ret) { + case BLK_STS_OK: *cookie = new_cookie; return; - } - - if (ret == BLK_MQ_RQ_QUEUE_ERROR) { + case BLK_STS_RESOURCE: + __blk_mq_requeue_request(rq); + goto insert; + default: *cookie = BLK_QC_T_NONE; - blk_mq_end_request(rq, BLK_STS_IOERR); + blk_mq_end_request(rq, ret); return; } - __blk_mq_requeue_request(rq); insert: blk_mq_sched_insert_request(rq, false, true, false, may_sleep); } diff --g
[PATCH 08/13] dm mpath: merge do_end_io_bio into multipath_end_io_bio
This simplifies the code and especially the error passing a bit and will help with the next patch. Signed-off-by: Christoph Hellwig <h...@lst.de> --- drivers/md/dm-mpath.c | 42 +++--- 1 file changed, 15 insertions(+), 27 deletions(-) diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c index 3df056b73b66..6d5ebb76149d 100644 --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -1510,24 +1510,24 @@ static int multipath_end_io(struct dm_target *ti, struct request *clone, return r; } -static int do_end_io_bio(struct multipath *m, struct bio *clone, -int error, struct dm_mpath_io *mpio) +static int multipath_end_io_bio(struct dm_target *ti, struct bio *clone, int error) { + struct multipath *m = ti->private; + struct dm_mpath_io *mpio = get_mpio_from_bio(clone); + struct pgpath *pgpath = mpio->pgpath; unsigned long flags; - if (!error) - return 0; /* I/O complete */ - - if (noretry_error(error)) - return error; + if (!error || noretry_error(error)) + goto done; - if (mpio->pgpath) - fail_path(mpio->pgpath); + if (pgpath) + fail_path(pgpath); if (atomic_read(>nr_valid_paths) == 0 && !test_bit(MPATHF_QUEUE_IF_NO_PATH, >flags)) { dm_report_EIO(m); - return -EIO; + error = -EIO; + goto done; } /* Queue for the daemon to resubmit */ @@ -1539,28 +1539,16 @@ static int do_end_io_bio(struct multipath *m, struct bio *clone, if (!test_bit(MPATHF_QUEUE_IO, >flags)) queue_work(kmultipathd, >process_queued_bios); - return DM_ENDIO_INCOMPLETE; -} - -static int multipath_end_io_bio(struct dm_target *ti, struct bio *clone, int error) -{ - struct multipath *m = ti->private; - struct dm_mpath_io *mpio = get_mpio_from_bio(clone); - struct pgpath *pgpath; - struct path_selector *ps; - int r; - - BUG_ON(!mpio); - - r = do_end_io_bio(m, clone, error, mpio); - pgpath = mpio->pgpath; + error = DM_ENDIO_INCOMPLETE; +done: if (pgpath) { - ps = >pg->ps; + struct path_selector *ps = >pg->ps; + if (ps->type->end_io) ps->type->end_io(ps, >path, mpio->nr_bytes); } - return r; + return error; } /* -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/13] nvme-lightnvm: use blk_execute_rq in nvme_nvm_submit_user_cmd
Instead of reinventing it poorly. Signed-off-by: Christoph Hellwig <h...@lst.de> Reviewed-by: Bart Van Assche <bart.vanass...@sandisk.com> Reviewed-by: Javier González <jav...@cnexlabs.com> --- drivers/nvme/host/lightnvm.c | 12 +--- 1 file changed, 1 insertion(+), 11 deletions(-) diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c index f5df78ed1e10..f3885b5e56bd 100644 --- a/drivers/nvme/host/lightnvm.c +++ b/drivers/nvme/host/lightnvm.c @@ -571,13 +571,6 @@ static struct nvm_dev_ops nvme_nvm_dev_ops = { .max_phys_sect = 64, }; -static void nvme_nvm_end_user_vio(struct request *rq, int error) -{ - struct completion *waiting = rq->end_io_data; - - complete(waiting); -} - static int nvme_nvm_submit_user_cmd(struct request_queue *q, struct nvme_ns *ns, struct nvme_nvm_command *vcmd, @@ -608,7 +601,6 @@ static int nvme_nvm_submit_user_cmd(struct request_queue *q, rq->timeout = timeout ? timeout : ADMIN_TIMEOUT; rq->cmd_flags &= ~REQ_FAILFAST_DRIVER; - rq->end_io_data = if (ppa_buf && ppa_len) { ppa_list = dma_pool_alloc(dev->dma_pool, GFP_KERNEL, _dma); @@ -662,9 +654,7 @@ static int nvme_nvm_submit_user_cmd(struct request_queue *q, } submit: - blk_execute_rq_nowait(q, NULL, rq, 0, nvme_nvm_end_user_vio); - - wait_for_completion_io(); + blk_execute_rq(q, NULL, rq, 0); if (nvme_req(rq)->flags & NVME_REQ_CANCELLED) ret = -EINTR; -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/13] block_dev: propagate bio_iov_iter_get_pages error in __blkdev_direct_IO
Once we move the block layer to its own status code we'll still want to propagate the bio_iov_iter_get_pages, so restructure __blkdev_direct_IO to take ret into account when returning the errno. Signed-off-by: Christoph Hellwig <h...@lst.de> --- fs/block_dev.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/fs/block_dev.c b/fs/block_dev.c index 51959936..c1dc393ad6b9 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -334,7 +334,7 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages) bool is_read = (iov_iter_rw(iter) == READ), is_sync; loff_t pos = iocb->ki_pos; blk_qc_t qc = BLK_QC_T_NONE; - int ret; + int ret = 0; if ((pos | iov_iter_alignment(iter)) & (bdev_logical_block_size(bdev) - 1)) @@ -363,7 +363,7 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages) ret = bio_iov_iter_get_pages(bio, iter); if (unlikely(ret)) { - bio->bi_error = ret; + bio->bi_error = -EIO; bio_endio(bio); break; } @@ -412,7 +412,8 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages) } __set_current_state(TASK_RUNNING); - ret = dio->bio.bi_error; + if (!ret) + ret = dio->bio.bi_error; if (likely(!ret)) ret = dio->size; -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 09/13] dm: don't return errnos from ->map
Instead use the special DM_MAPIO_KILL return value to return -EIO just like we do for the request based path. Note that dm-log-writes returned -ENOMEM in a few places, which now becomes -EIO instead. No consumer treats -ENOMEM special so this shouldn't be an issue (and it should use a mempool to start with to make guaranteed progress). Signed-off-by: Christoph Hellwig <h...@lst.de> --- drivers/md/dm-crypt.c | 4 ++-- drivers/md/dm-flakey.c| 4 ++-- drivers/md/dm-integrity.c | 12 ++-- drivers/md/dm-log-writes.c| 4 ++-- drivers/md/dm-mpath.c | 13 ++--- drivers/md/dm-raid1.c | 6 +++--- drivers/md/dm-snap.c | 8 drivers/md/dm-target.c| 2 +- drivers/md/dm-verity-target.c | 6 +++--- drivers/md/dm-zero.c | 4 ++-- drivers/md/dm.c | 16 +++- 11 files changed, 46 insertions(+), 33 deletions(-) diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c index ebf9e72d479b..f4b51809db21 100644 --- a/drivers/md/dm-crypt.c +++ b/drivers/md/dm-crypt.c @@ -2795,10 +2795,10 @@ static int crypt_map(struct dm_target *ti, struct bio *bio) * and is aligned to this size as defined in IO hints. */ if (unlikely((bio->bi_iter.bi_sector & ((cc->sector_size >> SECTOR_SHIFT) - 1)) != 0)) - return -EIO; + return DM_MAPIO_KILL; if (unlikely(bio->bi_iter.bi_size & (cc->sector_size - 1))) - return -EIO; + return DM_MAPIO_KILL; io = dm_per_bio_data(bio, cc->per_bio_data_size); crypt_io_init(io, cc, bio, dm_target_offset(ti, bio->bi_iter.bi_sector)); diff --git a/drivers/md/dm-flakey.c b/drivers/md/dm-flakey.c index 13305a182611..e8f093b323ce 100644 --- a/drivers/md/dm-flakey.c +++ b/drivers/md/dm-flakey.c @@ -321,7 +321,7 @@ static int flakey_map(struct dm_target *ti, struct bio *bio) if (bio_data_dir(bio) == READ) { if (!fc->corrupt_bio_byte && !test_bit(DROP_WRITES, >flags) && !test_bit(ERROR_WRITES, >flags)) - return -EIO; + return DM_MAPIO_KILL; goto map_bio; } @@ -349,7 +349,7 @@ static int flakey_map(struct dm_target *ti, struct bio *bio) /* * By default, error all I/O. */ - return -EIO; + return DM_MAPIO_KILL; } map_bio: diff --git a/drivers/md/dm-integrity.c b/drivers/md/dm-integrity.c index c7f7c8d76576..ee78fb471229 100644 --- a/drivers/md/dm-integrity.c +++ b/drivers/md/dm-integrity.c @@ -1352,13 +1352,13 @@ static int dm_integrity_map(struct dm_target *ti, struct bio *bio) DMERR("Too big sector number: 0x%llx + 0x%x > 0x%llx", (unsigned long long)dio->range.logical_sector, bio_sectors(bio), (unsigned long long)ic->provided_data_sectors); - return -EIO; + return DM_MAPIO_KILL; } if (unlikely((dio->range.logical_sector | bio_sectors(bio)) & (unsigned)(ic->sectors_per_block - 1))) { DMERR("Bio not aligned on %u sectors: 0x%llx, 0x%x", ic->sectors_per_block, (unsigned long long)dio->range.logical_sector, bio_sectors(bio)); - return -EIO; + return DM_MAPIO_KILL; } if (ic->sectors_per_block > 1) { @@ -1368,7 +1368,7 @@ static int dm_integrity_map(struct dm_target *ti, struct bio *bio) if (unlikely((bv.bv_offset | bv.bv_len) & ((ic->sectors_per_block << SECTOR_SHIFT) - 1))) { DMERR("Bio vector (%u,%u) is not aligned on %u-sector boundary", bv.bv_offset, bv.bv_len, ic->sectors_per_block); - return -EIO; + return DM_MAPIO_KILL; } } } @@ -1383,18 +1383,18 @@ static int dm_integrity_map(struct dm_target *ti, struct bio *bio) wanted_tag_size *= ic->tag_size; if (unlikely(wanted_tag_size != bip->bip_iter.bi_size)) { DMERR("Invalid integrity data size %u, expected %u", bip->bip_iter.bi_size, wanted_tag_size); - return -EIO; + return DM_MAPIO_KILL; } } } else { if (unlikely(bip != NULL)) { DMERR("Unexpected integrity data when using internal hash"); - return -EIO; + return DM_M
[PATCH 05/13] fs: remove the unused error argument to dio_end_io()
Signed-off-by: Christoph Hellwig <h...@lst.de> Reviewed-by: Bart Van Assche <bart.vanass...@sandisk.com> --- fs/btrfs/inode.c | 6 +++--- fs/direct-io.c | 3 +-- include/linux/fs.h | 2 +- 3 files changed, 5 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 17cbe9306faf..758b2666885e 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8244,7 +8244,7 @@ static void btrfs_endio_direct_read(struct bio *bio) kfree(dip); dio_bio->bi_error = bio->bi_error; - dio_end_io(dio_bio, bio->bi_error); + dio_end_io(dio_bio); if (io_bio->end_io) io_bio->end_io(io_bio, err); @@ -8304,7 +8304,7 @@ static void btrfs_endio_direct_write(struct bio *bio) kfree(dip); dio_bio->bi_error = bio->bi_error; - dio_end_io(dio_bio, bio->bi_error); + dio_end_io(dio_bio); bio_put(bio); } @@ -8673,7 +8673,7 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode, * Releases and cleans up our dio_bio, no need to bio_put() * nor bio_endio()/bio_io_error() against dio_bio. */ - dio_end_io(dio_bio, ret); + dio_end_io(dio_bio); } if (io_bio) bio_put(io_bio); diff --git a/fs/direct-io.c b/fs/direct-io.c index a04ebea77de8..04247a6c3f73 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -348,13 +348,12 @@ static void dio_bio_end_io(struct bio *bio) /** * dio_end_io - handle the end io action for the given bio * @bio: The direct io bio thats being completed - * @error: Error if there was one * * This is meant to be called by any filesystem that uses their own dio_submit_t * so that the DIO specific endio actions are dealt with after the filesystem * has done it's completion work. */ -void dio_end_io(struct bio *bio, int error) +void dio_end_io(struct bio *bio) { struct dio *dio = bio->bi_private; diff --git a/include/linux/fs.h b/include/linux/fs.h index 803e5a9b2654..4388ab58843d 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2843,7 +2843,7 @@ enum { DIO_SKIP_DIO_COUNT = 0x08, }; -void dio_end_io(struct bio *bio, int error); +void dio_end_io(struct bio *bio); ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, struct block_device *bdev, struct iov_iter *iter, -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 02/13] scsi/osd: don't save block errors into req_results
We will only have sense data if the command executed and got a SCSI result, so this is pointless. Signed-off-by: Christoph Hellwig <h...@lst.de> Reviewed-by: Martin K. Petersen <martin.peter...@oracle.com> --- drivers/scsi/osd/osd_initiator.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/osd/osd_initiator.c b/drivers/scsi/osd/osd_initiator.c index 8a1b94816419..14785177ce7b 100644 --- a/drivers/scsi/osd/osd_initiator.c +++ b/drivers/scsi/osd/osd_initiator.c @@ -477,7 +477,7 @@ static void _set_error_resid(struct osd_request *or, struct request *req, int error) { or->async_error = error; - or->req_errors = scsi_req(req)->result ? : error; + or->req_errors = scsi_req(req)->result; or->sense_len = scsi_req(req)->sense_len; if (or->sense_len) memcpy(or->sense, scsi_req(req)->sense, or->sense_len); -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 04/13] dm: fix REQ_RAHEAD handling
A few (but not all) dm targets use a special EWOULDBLOCK error code for failing REQ_RAHEAD requests that fail due to a lack of available resources. But no one else knows about this magic code, and lower level drivers also don't generate it when failing read-ahead requests for similar reasons. So remove this special casing and ignore all additional error handling for REQ_RAHEAD - if this was a real underlying error we'd get a normal read once the real read comes in. Signed-off-by: Christoph Hellwig <h...@lst.de> Reviewed-by: Bart Van Assche <bart.vanass...@sandisk.com> --- drivers/md/dm-raid1.c | 4 ++-- drivers/md/dm-stripe.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/md/dm-raid1.c b/drivers/md/dm-raid1.c index a95cbb80fb34..5e30b08b91d9 100644 --- a/drivers/md/dm-raid1.c +++ b/drivers/md/dm-raid1.c @@ -1214,7 +1214,7 @@ static int mirror_map(struct dm_target *ti, struct bio *bio) */ if (!r || (r == -EWOULDBLOCK)) { if (bio->bi_opf & REQ_RAHEAD) - return -EWOULDBLOCK; + return -EIO; queue_bio(ms, bio, rw); return DM_MAPIO_SUBMITTED; @@ -1258,7 +1258,7 @@ static int mirror_end_io(struct dm_target *ti, struct bio *bio, int error) if (error == -EOPNOTSUPP) return error; - if ((error == -EWOULDBLOCK) && (bio->bi_opf & REQ_RAHEAD)) + if (bio->bi_opf & REQ_RAHEAD) return error; if (unlikely(error)) { diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c index 75152482f3ad..780e95889a7c 100644 --- a/drivers/md/dm-stripe.c +++ b/drivers/md/dm-stripe.c @@ -384,7 +384,7 @@ static int stripe_end_io(struct dm_target *ti, struct bio *bio, int error) if (!error) return 0; /* I/O complete */ - if ((error == -EWOULDBLOCK) && (bio->bi_opf & REQ_RAHEAD)) + if (bio->bi_opf & REQ_RAHEAD) return error; if (error == -EOPNOTSUPP) -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/13] gfs2: remove the unused sd_log_error field
Signed-off-by: Christoph Hellwig <h...@lst.de> Reviewed-by: Bart Van Assche <bart.vanass...@sandisk.com> --- fs/gfs2/incore.h | 1 - fs/gfs2/lops.c | 4 +--- 2 files changed, 1 insertion(+), 4 deletions(-) diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h index b7cf65d13561..aa3d44527fa2 100644 --- a/fs/gfs2/incore.h +++ b/fs/gfs2/incore.h @@ -815,7 +815,6 @@ struct gfs2_sbd { atomic_t sd_log_in_flight; struct bio *sd_log_bio; wait_queue_head_t sd_log_flush_wait; - int sd_log_error; atomic_t sd_reserving_log; wait_queue_head_t sd_reserving_log_wait; diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c index b1f9144b42c7..13ebf15a4db0 100644 --- a/fs/gfs2/lops.c +++ b/fs/gfs2/lops.c @@ -209,10 +209,8 @@ static void gfs2_end_log_write(struct bio *bio) struct page *page; int i; - if (bio->bi_error) { - sdp->sd_log_error = bio->bi_error; + if (bio->bi_error) fs_err(sdp, "Error %d writing to log\n", bio->bi_error); - } bio_for_each_segment_all(bvec, bio, i) { page = bvec->bv_page; -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html