Re: [PATCH 07/10] dax: export functions for use with btrfs

2018-12-05 Thread Christoph Hellwig
If you want to export these at all they have to be EXPORT_SYMBOL_GPL.

But I'd really like to avoid seeing another duplicate DAX I/O path.
Please try to adopt the existing iomap-based infrastructure for your
needs first.


Re: [RFC PATCH] btrfs: drop file privileges in btrfs_clone_files

2018-11-27 Thread Christoph Hellwig
On Wed, Nov 28, 2018 at 09:44:59AM +0200, Nikolay Borisov wrote:
> 
> 
> On 28.11.18 г. 5:07 ч., Lu Fengqi wrote:
> > The generic/513 tell that cloning into a file did not strip security
> > privileges (suid, capabilities) like a regular write would.
> > 
> > Signed-off-by: Lu Fengqi 
> > ---
> > The xfs and ocfs2 call generic_remap_file_range_prep to drop file
> > privileges, I'm not sure whether btrfs should do the same thing.
> 
> Why do you think btrfs shouldn't do the same thing. Looking at
> remap_file_range_prep it seems that btrfs is missing a ton of checks
> that are useful i.e immutable files/aligned offsets etc.

Any chance we could move btrfs over to use remap_file_range_prep so that
all file systems share the exact same checks?


Re: [PATCH V10 08/19] btrfs: move bio_pages_all() to btrfs

2018-11-19 Thread Christoph Hellwig
On Mon, Nov 19, 2018 at 04:19:24PM +0800, Ming Lei wrote:
> On Fri, Nov 16, 2018 at 02:38:45PM +0100, Christoph Hellwig wrote:
> > On Thu, Nov 15, 2018 at 04:52:55PM +0800, Ming Lei wrote:
> > > BTRFS is the only user of this helper, so move this helper into
> > > BTRFS, and implement it via bio_for_each_segment_all(), since
> > > bio->bi_vcnt may not equal to number of pages after multipage bvec
> > > is enabled.
> > 
> > btrfs only uses the value to check if it is larger than 1.  No amount
> > of multipage bio merging should ever make bi_vcnt go from 0 to 1 or
> > vice versa.
> 
> Could you explain a bit why?
> 
> Suppose 2 physically continuous pages are added to this bio, .bi_vcnt
> can be 1 in case of multi-page bvec, but it is 2 in case of single-page
> bvec.

True, I did think of 0 vs 1.

The magic here in btrfs still doesn't make much sense.  The comments
down in btrfs_check_repairable talk about sectors, so it might be a
leftover from the blocksize == PAGE_SIZE days (assuming those patches
actually got merged).  I'd like to have the btrfs folks chime in,
but in the end we should probably check if the bio was larger than
a single sector here.


Re: [PATCH 04/25] vfs: strengthen checking of file range inputs to generic_remap_checks

2018-10-11 Thread Christoph Hellwig
On Wed, Oct 10, 2018 at 09:12:46PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong 
> 
> File range remapping, if allowed to run past the destination file's EOF,
> is an optimization on a regular file write.  Regular file writes that
> extend the file length are subject to various constraints which are not
> checked by range cloning.
> 
> This is a correctness problem because we're never allowed to touch
> ranges that the page cache can't support (s_maxbytes); we're not
> supposed to deal with large offsets (MAX_NON_LFS) if O_LARGEFILE isn't
> set; and we must obey resource limits (RLIMIT_FSIZE).
> 
> Therefore, add these checks to the new generic_remap_checks function so
> that we curtail unexpected behavior.
> 
> Signed-off-by: Darrick J. Wong 
> Reviewed-by: Amir Goldstein 

Looks good,

Reviewed-by: Christoph Hellwig 


Re: [PATCH 03/25] vfs: check file ranges before cloning files

2018-10-11 Thread Christoph Hellwig
> -EXPORT_SYMBOL(vfs_clone_file_prep_inodes);
> +EXPORT_SYMBOL(vfs_clone_file_prep);

Btw, why isn't this EXPORT_SYMBOL_GPL?  It is rather Linux internal
code, including some that I wrote which you lifted into the core
in "vfs: refactor clone/dedupe_file_range common functions".


Re: [PATCH 02/25] vfs: vfs_clone_file_prep_inodes should return EINVAL for a clone from beyond EOF

2018-10-11 Thread Christoph Hellwig
On Wed, Oct 10, 2018 at 09:12:30PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong 
> 
> vfs_clone_file_prep_inodes cannot return 0 if it is asked to remap from
> a zero byte file because that's what btrfs does.
> 
> Signed-off-by: Darrick J. Wong 
> ---

Maybe it would be a good time to switch btrfs to use
vfs_clone_file_prep_inodes so that we don't have any discrepancies?

Otherwise looks good:

Reviewed-by: Christoph Hellwig 


Re: [PATCH 01/25] xfs: add a per-xfs trace_printk macro

2018-10-11 Thread Christoph Hellwig
On Wed, Oct 10, 2018 at 09:12:23PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong 
> 
> Add a "xfs_tprintk" macro so that developers can use trace_printk to
> print out arbitrary debugging information with the XFS device name
> attached to the trace output.

I can't say I'm a fan of this.  trace_printk is a debugging aid,
and opencoding the file system name really isn't much of a burden.


Re: exporting uuid_t from uapi/linux/uuid.h

2018-08-22 Thread Christoph Hellwig
On Tue, Aug 21, 2018 at 10:25:50PM +0300, Nikolay Borisov wrote:
> Hello,
> 
> I'm doing some refactoring on btrfs and would like to convert some of
> the "naked" definition of UUID in on-disk object to uuid_t. Since those
> structs are defined in include/uapi/linux/btrfs_tree.h I naturally would
> need to expose the kernel uuid_t to the uapi header via
> include/uapi/linux/uuid.h. The uuid was originally unexported in
> 60927bc31436 ("uuid: remove uuid_be defintions from the uapi header")
>  but at that time it was called uuid_be. The thing is that btrfs-progs
> for example uses both btrfs_tree.h uapi header as well as libuuid. THe
> latter also defines uuid_t. Both the kernel one and libuuid are binary
> compatible.
> 
> So what's the correct way (if any) to expose a kernel type that might
> have also been defined by a userspace library?

Please just use an raw array of __u8 for your APIs.


Re: [PATCH 01/10] mm: pagecache add lock

2018-05-23 Thread Christoph Hellwig
On Sun, May 20, 2018 at 06:45:24PM -0400, Kent Overstreet wrote:
> > 
> > Honestly I think this probably should be in the core.  But IFF we move
> > it to the core the existing users of per-fs locks need to be moved
> > over first.  E.g. XFS as the very first one, and at least ext4 and f2fs
> > that copied the approach, and probably more if you audit deep enough.
> 
> I'm not going to go and redo locking in XFS and ext4 as a prerequisite to
> merging bcachefs. Sorry, but that's a bit crazy.

It isn't crazy at all.  In general we expect people to do their fair
share of core work to get their pet feature in.  How much is acceptable
is a difficult question and not black and white.

But if you want to grow a critical core structure you better take a stab
at converting existing users.  Without that the tradeoff of growing
that core structure is simply not given.

Or to put it in words for this exact feature:  unless your new field
is also used by mainstream file systems it is not going to be added.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: implement unlocked buffered write

2018-05-23 Thread Christoph Hellwig
On Tue, May 22, 2018 at 02:31:36PM -0400, Chris Mason wrote:
> > And what protects two writes from interleaving their results now?
> 
> page locks...ish, we at least won't have results interleaved in a single
> page.  For btrfs it'll actually be multiple pages since we try to do more
> than one at a time.

I think you are going to break just about every assumption people
have that any single write is going to be atomic vs another write.

E.g. this comes from the posix read definition reference by the
write definition:

"I/O is intended to be atomic to ordinary files and pipes and FIFOs.
Atomic means that all the bytes from a single operation that started out
together end up together, without interleaving from other I/O
operations. It is a known attribute of terminals that this is not
honored, and terminals are explicitly (and implicitly permanently)
excepted, making the behavior unspecified. The behavior for other device
types is also left unspecified, but the wording is intended to imply
that future standards might choose to specify atomicity (or not).
"

Without taking the inode lock (or some sort of range lock) you can
easily interleave data from two write requests.

> But we're not avoiding the inode lock completely, we're just dropping it for
> the expensive parts of writing to the file.  A quick guess about what the
> expensive parts are:

The way I read the patch it basically 'avoids' the inode lock for almost
the whole write call, just minus some setup.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: implement unlocked buffered write

2018-05-22 Thread Christoph Hellwig
On Wed, May 16, 2018 at 11:52:37AM +0800, robbieko wrote:
> From: Robbie Ko 
> 
> This idea is from direct io. By this patch, we can make the buffered
> write parallel, and improve the performance and latency. But because we
> can not update isize without i_mutex, the unlocked buffered write just
> can be done in front of the EOF.
> 
> We needn't worry about the race between buffered write and truncate,
> because the truncate need wait until all the buffered write end.
> 
> And we also needn't worry about the race between dio write and punch hole,
> because we have extent lock to protect our operation.
> 
> I ran fio to test the performance of this feature.
> 
> == Hardware ==
> CPU: Intel® Xeon® D-1531
> SSD: Intel S3710 200G
> Volume : RAID 5 , SSD * 6
> 
> == config file ==
> [global]
> group_reporting
> time_based
> thread=1
> norandommap
> ioengine=libaio
> bs=4k
> iodepth=32
> size=16G
> runtime=180
> numjobs=8
> rw=randwrite
> 
> [file1]
> filename=/mnt/btrfs/nocow/testfile
> 
> == result (iops) ==
> lock = 68470
> unlocked = 94242
> 
> == result (clat) ==
> lock
>  lat (usec): min=184, max=1209.9K, avg=3738.35, stdev=20869.49
>  clat percentiles (usec):
>   |  1.00th=[  322],  5.00th=[  330], 10.00th=[  334], 20.00th=[  346],
>   | 30.00th=[  370], 40.00th=[  386], 50.00th=[  406], 60.00th=[  446],
>   | 70.00th=[  516], 80.00th=[  612], 90.00th=[  828], 95.00th=[10432],
>   | 99.00th=[84480], 99.50th=[117248], 99.90th=[226304], 99.95th=[333824],
>   | 99.99th=[692224]
> 
> unlocked
>  lat (usec): min=10, max=218208, avg=2691.44, stdev=5380.82
>  clat percentiles (usec):
>   |  1.00th=[  302],  5.00th=[  390], 10.00th=[  442], 20.00th=[  502],
>   | 30.00th=[  548], 40.00th=[  596], 50.00th=[  652], 60.00th=[  724],
>   | 70.00th=[  916], 80.00th=[ 5024], 90.00th=[ 5664], 95.00th=[10048],
>   | 99.00th=[29568], 99.50th=[39168], 99.90th=[54016], 99.95th=[59648],
>   | 99.99th=[78336]
> 
> Signed-off-by: Robbie Ko 
> ---
>  fs/btrfs/file.c | 17 +
>  1 file changed, 17 insertions(+)
> 
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 41ab907..8eac540 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1600,6 +1600,7 @@ static noinline ssize_t __btrfs_buffered_write(struct 
> file *file,
>   int ret = 0;
>   bool only_release_metadata = false;
>   bool force_page_uptodate = false;
> + bool relock = false;
>  
>   nrptrs = min(DIV_ROUND_UP(iov_iter_count(i), PAGE_SIZE),
>   PAGE_SIZE / (sizeof(struct page *)));
> @@ -1609,6 +1610,18 @@ static noinline ssize_t __btrfs_buffered_write(struct 
> file *file,
>   if (!pages)
>   return -ENOMEM;
>  
> + inode_dio_begin(inode);
> +
> + /*
> +  * If the write is beyond the EOF, we need update
> +  * the isize, but it is protected by i_mutex. So we can
> +  * not unlock the i_mutex at this case.
> +  */
> + if (pos + iov_iter_count(i) <= i_size_read(inode)) {
> + inode_unlock(inode);

And what protects two writes from interleaving their results now?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/12] block: convert bounce, q->bio_split to bioset_init()/mempool_init()

2018-05-22 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig <h...@lst.de>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/12] target: convert to bioset_init()/mempool_init()

2018-05-22 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig <h...@lst.de>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/12] fs: convert block_dev.c to bioset_init()

2018-05-22 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig <h...@lst.de>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 11/12] xfs: convert to bioset_init()/mempool_init()

2018-05-22 Thread Christoph Hellwig
This creates a minor conflict with my iomap series, but that should
be easy to merge if needed.

Otherwise looks good:

Reviewed-by: Christoph Hellwig <h...@lst.de>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 12/12] block: Drop bioset_create()

2018-05-22 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig <h...@lst.de>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 00/12] get rid of GFP_ZONE_TABLE/BAD

2018-05-22 Thread Christoph Hellwig
This seems to be missing patch 1 and generally be in somewhat odd format.
Can you try to resend it with git-send-email and against current Linus'
tree?

Also I'd suggest you do cleanups like adding and using __GFP_ZONE_MASK
at the beginning of the series before doing any real changes.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 02/12] arch/x86/kernel/amd_gart_64: update usage of address zone modifiers

2018-05-22 Thread Christoph Hellwig
This code doesn't exist in current mainline.  What kernel version
is your patch against?

On Mon, May 21, 2018 at 11:20:23PM +0800, Huaisheng Ye wrote:
> From: Huaisheng Ye 
> 
> Use __GFP_ZONE_MASK to replace (__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32).
> 
> ___GFP_DMA, ___GFP_HIGHMEM and ___GFP_DMA32 have been deleted from GFP
> bitmasks, the bottom three bits of GFP mask is reserved for storing
> encoded zone number.
> __GFP_DMA, __GFP_HIGHMEM and __GFP_DMA32 should not be operated by OR.

If they have already been deleted the identifier should not exist
anymore, so either your patch has issues, or at least the description.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/13] convert block layer to bioset_init()/mempool_init()

2018-05-22 Thread Christoph Hellwig
On Mon, May 21, 2018 at 07:38:55PM -0400, Kent Overstreet wrote:
> On Mon, May 21, 2018 at 02:24:32PM -0400, Mike Snitzer wrote:
> > Every single data structure change in this series should be reviewed for
> > unforeseen alignment consequences.  Jens seemed to say that is
> > worthwhile.  Not sure if he'll do it or we divide it up.  If we divide
> > it up a temp topic branch should be published for others to inspect.
> > 
> > Could be alignment hasn't been a historic concern for a bunch of the
> > data structures changed in this series.. if so then all we can do is fix
> > up any obvious potential for false sharing.
> 
> Honestly, I almost never worry about alignment... the very few times I do 
> care,
> I use __cacheline_aligned_in_smp.

And that generally is the right stratgey.  If Mike really doesn't want
a change we can just open code the kmalloc for the bio set there, the
important point is that we should not keep the old API around for no
good reason.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 05/12] include/linux/dma-mapping: update usage of address zone modifiers

2018-05-21 Thread Christoph Hellwig
On Mon, May 21, 2018 at 11:20:26PM +0800, Huaisheng Ye wrote:
> From: Huaisheng Ye 
> 
> Use __GFP_ZONE_MASK to replace (__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32).
> 
> ___GFP_DMA, ___GFP_HIGHMEM and ___GFP_DMA32 have been deleted from GFP
> bitmasks, the bottom three bits of GFP mask is reserved for storing
> encoded zone number.
> __GFP_DMA, __GFP_HIGHMEM and __GFP_DMA32 should not be operated with
>each others by OR.

You have to include me for the whole series, otherwise I have absolutely
no way to properly review your patch.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/10] bcache: move closures to lib/

2018-05-18 Thread Christoph Hellwig
On Fri, May 18, 2018 at 03:49:13AM -0400, Kent Overstreet wrote:
> Prep work for bcachefs - being a fork of bcache it also uses closures

Hell no.  This code needs to go away and not actually be promoted to
lib/.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/10] Dynamic fault injection

2018-05-18 Thread Christoph Hellwig
On Fri, May 18, 2018 at 03:49:18AM -0400, Kent Overstreet wrote:
> Signed-off-by: Kent Overstreet 

Completely lacks any explanation or argument why it would be useful.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/10] don't use spin_lock_irqsave() unnecessarily

2018-05-18 Thread Christoph Hellwig
On Fri, May 18, 2018 at 03:49:08AM -0400, Kent Overstreet wrote:
> Signed-off-by: Kent Overstreet 

Looks generally fine.  A little changelog with an explanation of
how we obviously never could get here with irqs disabled would
be nice, though.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 06/10] Generic radix trees

2018-05-18 Thread Christoph Hellwig
Completely lacks any explanation, including why this should be
in lib/.  Also should come in the same series with actual users
of the infrastructure.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/10] mm: export find_get_pages()

2018-05-18 Thread Christoph Hellwig
On Fri, May 18, 2018 at 03:49:02AM -0400, Kent Overstreet wrote:
> Signed-off-by: Kent Overstreet 
> ---
>  mm/filemap.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 31dd888785..78b99019bf 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1845,6 +1845,7 @@ unsigned find_get_pages_range(struct address_space 
> *mapping, pgoff_t *start,
>  
>   return ret;
>  }
> +EXPORT_SYMBOL(find_get_pages_range);

EXPORT_SYMBOL_GPL and only together with the actual submission of
a user of the interface.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/10] mm: pagecache add lock

2018-05-18 Thread Christoph Hellwig
On Fri, May 18, 2018 at 06:13:06AM -0700, Matthew Wilcox wrote:
> > Historically, the only problematic case has been direct IO, and people
> > have been willing to say "well, if you mix buffered and direct IO you
> > get what you deserve", and that's probably not unreasonable. But now we
> > have fallocate insert range and collapse range, and those are broken in
> > ways I frankly don't want to think about if they can't ensure consistency
> > with the page cache.
> 
> ext4 manages collapse-vs-pagefault with the ext4-specific i_mmap_sem.
> You may get pushback on the grounds that this ought to be a
> filesystem-specific lock rather than one embedded in the generic inode.

Honestly I think this probably should be in the core.  But IFF we move
it to the core the existing users of per-fs locks need to be moved
over first.  E.g. XFS as the very first one, and at least ext4 and f2fs
that copied the approach, and probably more if you audit deep enough.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/14] mm: turn on vm_fault_t type checking

2018-05-16 Thread Christoph Hellwig
On Wed, May 16, 2018 at 08:08:29AM -0700, Darrick J. Wong wrote:
> Uh, we're changing function signatures /and/ redefinining vm_fault_t?
> All in the same 90K patch?
> 
> I /was/ expecting a series of "convert X and all callers/users"
> patches followed by a trivial one to switch the definition, not a giant
> pile of change.  FWIW I don't mind so much if you make a patch
> containing a change for some super-common primitive and a hojillion
> little diff hunks tree-wide, but only one logical change at a time for a
> big patch, please...
> 
> I quite prefer seeing the whole series from start to finish all packaged
> up in one series, but wow this was overwhelming. :/

Another vote to split the change of the typedef, ok I get the message..
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: vm_fault_t conversion, for real

2018-05-16 Thread Christoph Hellwig
On Wed, May 16, 2018 at 06:22:56AM -0700, Matthew Wilcox wrote:
> Perhaps you should try being less of an arsehole if you don't want to
> get yelled at?  I don't mind when you're an arsehole towards me, but I
> do mind when you're an arsehole towards newcomers.  How are we supposed
> to attract and retain new maintainers when you're so rude?

*plonk*  The only one I'm seeing being extremely rude here is you.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/14] mm: turn on vm_fault_t type checking

2018-05-16 Thread Christoph Hellwig
On Wed, May 16, 2018 at 04:28:13AM -0700, Matthew Wilcox wrote:
> On Wed, May 16, 2018 at 07:43:48AM +0200, Christoph Hellwig wrote:
> > Switch vm_fault_t to point to an unsigned int with __bіtwise annotations.
> > This both catches any old ->fault or ->page_mkwrite instance with plain
> > compiler type checking, as well as finding more intricate problems with
> > sparse.
> 
> Come on, Christoph; you know better than this.  This patch is completely
> unreviewable.  Split it into one patch per maintainer tree, and in any
> event, the patch to convert vm_fault_t to an unsigned int should be
> separated from all the trivial conversions.

The whole point is that tiny split patches for mechnical translations
are totally pointless.  Switching the typedef might be worth splitting
if people really insist.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: vm_fault_t conversion, for real

2018-05-16 Thread Christoph Hellwig
On Wed, May 16, 2018 at 04:23:47AM -0700, Matthew Wilcox wrote:
> On Wed, May 16, 2018 at 07:43:34AM +0200, Christoph Hellwig wrote:
> > this series tries to actually turn vm_fault_t into a type that can be
> > typechecked and checks the fallout instead of sprinkling random
> > annotations without context.
> 
> Yes, why should we have small tasks that newcomers can do when the mighty
> Christoph Hellwig can swoop in and take over from them?  Seriously,
> can't your talents find a better use than this?

I've spent less time on this than trying to argue to you and Souptick
that these changes are only to get ignored and yelled at as an
"asshole maintainer".  So yes, I could have done more productive things
if you hadn't forced this escalation.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/14] vgem: separate errno from VM_FAULT_* values

2018-05-16 Thread Christoph Hellwig
On Wed, May 16, 2018 at 11:53:03AM +0200, Daniel Vetter wrote:
> Reviewed-by: Daniel Vetter 
> 
> Want me to merge this through drm-misc or plan to pick it up yourself?

For now I just want a honest discussion if people really actually
want the vm_fault_t change with the whole picture in place.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


vm_fault_t conversion, for real

2018-05-15 Thread Christoph Hellwig
Hi all,

this series tries to actually turn vm_fault_t into a type that can be
typechecked and checks the fallout instead of sprinkling random
annotations without context.

The first one fixes a real bug in orangefs, the second and third fix
mismatched existing vm_fault_t annotations on the same function, the
fourth removes an unused export that was in the chain.  The remainder
until the last one do some not quite trivial conversions, and the last
one does the trivial mass annotation and flips vm_fault_t to a __bitwise
unsigned int - the unsigned means we also get plain compiler type
checking for the new ->fault signature even without sparse.

This has survived an x86 allyesconfig build, and got a SUCCESS from the
buildbot that I don't really trust - I'm pretty sure there are bits
and pieces hiding in other architectures that it hasn't caught up to.

The sparse annotations are manuall verified for the core MM code and
a few other interesting bits (e.g. DAX and the x86 fault code)

The series is against linux-next as of 2018/05/15 to make sure any
annotations in subsystem trees are picked up.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/14] fs: make the filemap_page_mkwrite prototype consistent

2018-05-15 Thread Christoph Hellwig
!CONFIG_MMU version didn't agree with the rest of the kernel..

Signed-off-by: Christoph Hellwig <h...@lst.de>
---
 mm/filemap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 52517f28e6f4..cf21ced98eff 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2748,7 +2748,7 @@ int generic_file_readonly_mmap(struct file *file, struct 
vm_area_struct *vma)
return generic_file_mmap(file, vma);
 }
 #else
-int filemap_page_mkwrite(struct vm_fault *vmf)
+vm_fault_t filemap_page_mkwrite(struct vm_fault *vmf)
 {
return -ENOSYS;
 }
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/14] orangefs: don't return errno values from ->fault

2018-05-15 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig <h...@lst.de>
---
 fs/orangefs/file.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c
index 26358efbf794..b4a25cd4f3fa 100644
--- a/fs/orangefs/file.c
+++ b/fs/orangefs/file.c
@@ -528,18 +528,16 @@ static long orangefs_ioctl(struct file *file, unsigned 
int cmd, unsigned long ar
return ret;
 }
 
-static int orangefs_fault(struct vm_fault *vmf)
+static vm_fault_t orangefs_fault(struct vm_fault *vmf)
 {
struct file *file = vmf->vma->vm_file;
int rc;
-   rc = orangefs_inode_getattr(file->f_mapping->host, 0, 1,
-   STATX_SIZE);
-   if (rc == -ESTALE)
-   rc = -EIO;
+
+   rc = orangefs_inode_getattr(file->f_mapping->host, 0, 1, STATX_SIZE);
if (rc) {
gossip_err("%s: orangefs_inode_getattr failed, "
"rc:%d:.\n", __func__, rc);
-   return rc;
+   return VM_FAULT_SIGBUS;
}
return filemap_fault(vmf);
 }
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/14] dax: make the dax_iomap_fault prototype consistent

2018-05-15 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig <h...@lst.de>
---
 include/linux/dax.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/dax.h b/include/linux/dax.h
index dc65ece825ee..a292bccdc274 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -183,7 +183,7 @@ void dax_flush(struct dax_device *dax_dev, void *addr, 
size_t size);
 
 ssize_t dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
const struct iomap_ops *ops);
-int dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size,
+vm_fault_t dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size,
pfn_t *pfnp, int *errp, const struct iomap_ops *ops);
 vm_fault_t dax_finish_sync_fault(struct vm_fault *vmf,
enum page_entry_size pe_size, pfn_t pfn);
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/14] mm: remove the unused device_private_entry_fault export

2018-05-15 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig <h...@lst.de>
---
 kernel/memremap.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/memremap.c b/kernel/memremap.c
index db4e1a373e5f..59ee3b604b39 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -65,7 +65,6 @@ int device_private_entry_fault(struct vm_area_struct *vma,
 */
return page->pgmap->page_fault(vma, addr, page, flags, pmdp);
 }
-EXPORT_SYMBOL(device_private_entry_fault);
 #endif /* CONFIG_DEVICE_PRIVATE */
 
 static void pgmap_radix_release(struct resource *res, unsigned long end_pgoff)
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/14] ceph: untangle ceph_filemap_fault

2018-05-15 Thread Christoph Hellwig
Streamline the code to have a somewhat natural flow, and separate the
errno values from the VM_FAULT_* values.

Signed-off-by: Christoph Hellwig <h...@lst.de>
---
 fs/ceph/addr.c | 100 +
 1 file changed, 51 insertions(+), 49 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 5f7ad3d0df2e..6e80894ca073 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -1428,15 +1428,18 @@ static void ceph_restore_sigs(sigset_t *oldset)
 /*
  * vm ops
  */
-static int ceph_filemap_fault(struct vm_fault *vmf)
+static vm_fault_t ceph_filemap_fault(struct vm_fault *vmf)
 {
struct vm_area_struct *vma = vmf->vma;
struct inode *inode = file_inode(vma->vm_file);
+   struct address_space *mapping = inode->i_mapping;
struct ceph_inode_info *ci = ceph_inode(inode);
struct ceph_file_info *fi = vma->vm_file->private_data;
-   struct page *pinned_page = NULL;
+   struct page *pinned_page = NULL, *page;
loff_t off = vmf->pgoff << PAGE_SHIFT;
-   int want, got, ret;
+   int want, got, err = 0;
+   vm_fault_t ret = 0;
+   bool did_fault = false;
sigset_t oldset;
 
ceph_block_sigs();
@@ -1449,9 +1452,9 @@ static int ceph_filemap_fault(struct vm_fault *vmf)
want = CEPH_CAP_FILE_CACHE;
 
got = 0;
-   ret = ceph_get_caps(ci, CEPH_CAP_FILE_RD, want, -1, , _page);
-   if (ret < 0)
-   goto out_restore;
+   err = ceph_get_caps(ci, CEPH_CAP_FILE_RD, want, -1, , _page);
+   if (err < 0)
+   goto out_errno;
 
dout("filemap_fault %p %llu~%zd got cap refs on %s\n",
 inode, off, (size_t)PAGE_SIZE, ceph_cap_string(got));
@@ -1462,8 +1465,8 @@ static int ceph_filemap_fault(struct vm_fault *vmf)
ceph_add_rw_context(fi, _ctx);
ret = filemap_fault(vmf);
ceph_del_rw_context(fi, _ctx);
-   } else
-   ret = -EAGAIN;
+   did_fault = true;
+   }
 
dout("filemap_fault %p %llu~%zd dropping cap refs on %s ret %d\n",
 inode, off, (size_t)PAGE_SIZE, ceph_cap_string(got), ret);
@@ -1471,57 +1474,55 @@ static int ceph_filemap_fault(struct vm_fault *vmf)
put_page(pinned_page);
ceph_put_cap_refs(ci, got);
 
-   if (ret != -EAGAIN)
+   if (did_fault)
goto out_restore;
 
/* read inline data */
if (off >= PAGE_SIZE) {
/* does not support inline data > PAGE_SIZE */
ret = VM_FAULT_SIGBUS;
+   goto out_restore;
+   }
+
+   page = find_or_create_page(mapping, 0,
+   mapping_gfp_constraint(mapping, ~__GFP_FS));
+   if (!page) {
+   ret = VM_FAULT_OOM;
+   goto out_inline;
+   }
+
+   err = __ceph_do_getattr(inode, page, CEPH_STAT_CAP_INLINE_DATA, true);
+   if (err < 0 || off >= i_size_read(inode)) {
+   unlock_page(page);
+   put_page(page);
+   if (err < 0)
+   goto out_errno;
+   ret = VM_FAULT_SIGBUS;
} else {
-   int ret1;
-   struct address_space *mapping = inode->i_mapping;
-   struct page *page = find_or_create_page(mapping, 0,
-   mapping_gfp_constraint(mapping,
-   ~__GFP_FS));
-   if (!page) {
-   ret = VM_FAULT_OOM;
-   goto out_inline;
-   }
-   ret1 = __ceph_do_getattr(inode, page,
-CEPH_STAT_CAP_INLINE_DATA, true);
-   if (ret1 < 0 || off >= i_size_read(inode)) {
-   unlock_page(page);
-   put_page(page);
-   if (ret1 < 0)
-   ret = ret1;
-   else
-   ret = VM_FAULT_SIGBUS;
-   goto out_inline;
-   }
-   if (ret1 < PAGE_SIZE)
-   zero_user_segment(page, ret1, PAGE_SIZE);
+   if (err < PAGE_SIZE)
+   zero_user_segment(page, err, PAGE_SIZE);
else
flush_dcache_page(page);
SetPageUptodate(page);
vmf->page = page;
ret = VM_FAULT_MAJOR | VM_FAULT_LOCKED;
-out_inline:
-   dout("filemap_fault %p %llu~%zd read inline data ret %d\n",
-inode, off, (size_t)PAGE_SIZE, ret);
}
+
+out_inline:
+   dout("filemap_fault %p %llu~%zd read inline data ret %d\n",
+inode, off, (size_t)PAGE_SIZE, ret);
 out_restore:
ceph_restore_sigs();
-   if (ret < 0)
-   ret =

[PATCH 06/14] btrfs: separate errno from VM_FAULT_* values

2018-05-15 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig <h...@lst.de>
---
 fs/btrfs/ctree.h |  2 +-
 fs/btrfs/inode.c | 19 ++-
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 1485cd130e2b..02a0de73c1d1 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3203,7 +3203,7 @@ int btrfs_merge_bio_hook(struct page *page, unsigned long 
offset,
 size_t size, struct bio *bio,
 unsigned long bio_flags);
 void btrfs_set_range_writeback(void *private_data, u64 start, u64 end);
-int btrfs_page_mkwrite(struct vm_fault *vmf);
+vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf);
 int btrfs_readpage(struct file *file, struct page *page);
 void btrfs_evict_inode(struct inode *inode);
 int btrfs_write_inode(struct inode *inode, struct writeback_control *wbc);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index ec9db248c499..f4f03f0f4556 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8824,7 +8824,7 @@ static void btrfs_invalidatepage(struct page *page, 
unsigned int offset,
  * beyond EOF, then the page is guaranteed safe against truncation until we
  * unlock the page.
  */
-int btrfs_page_mkwrite(struct vm_fault *vmf)
+vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
 {
struct page *page = vmf->page;
struct inode *inode = file_inode(vmf->vma->vm_file);
@@ -8836,7 +8836,8 @@ int btrfs_page_mkwrite(struct vm_fault *vmf)
char *kaddr;
unsigned long zero_start;
loff_t size;
-   int ret;
+   vm_fault_t ret;
+   int err;
int reserved = 0;
u64 reserved_space;
u64 page_start;
@@ -8858,14 +8859,14 @@ int btrfs_page_mkwrite(struct vm_fault *vmf)
 * end up waiting indefinitely to get a lock on the page currently
 * being processed by btrfs_page_mkwrite() function.
 */
-   ret = btrfs_delalloc_reserve_space(inode, _reserved, page_start,
+   err = btrfs_delalloc_reserve_space(inode, _reserved, page_start,
   reserved_space);
-   if (!ret) {
-   ret = file_update_time(vmf->vma->vm_file);
+   if (!err) {
+   err = file_update_time(vmf->vma->vm_file);
reserved = 1;
}
-   if (ret) {
-   if (ret == -ENOMEM)
+   if (err) {
+   if (err == -ENOMEM)
ret = VM_FAULT_OOM;
else /* -ENOSPC, -EIO, etc */
ret = VM_FAULT_SIGBUS;
@@ -8927,9 +8928,9 @@ int btrfs_page_mkwrite(struct vm_fault *vmf)
  EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG,
  0, 0, _state);
 
-   ret = btrfs_set_extent_delalloc(inode, page_start, end, 0,
+   err = btrfs_set_extent_delalloc(inode, page_start, end, 0,
_state, 0);
-   if (ret) {
+   if (err) {
unlock_extent_cached(io_tree, page_start, page_end,
 _state);
ret = VM_FAULT_SIGBUS;
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/14] ext4: separate errno from VM_FAULT_* values

2018-05-15 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig <h...@lst.de>
---
 fs/ext4/ext4.h  |  4 ++--
 fs/ext4/inode.c | 30 +++---
 2 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index fa52b7dd4542..48592d0edf3e 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2463,8 +2463,8 @@ extern int ext4_writepage_trans_blocks(struct inode *);
 extern int ext4_chunk_trans_blocks(struct inode *, int nrblocks);
 extern int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
 loff_t lstart, loff_t lend);
-extern int ext4_page_mkwrite(struct vm_fault *vmf);
-extern int ext4_filemap_fault(struct vm_fault *vmf);
+extern vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf);
+extern vm_fault_t ext4_filemap_fault(struct vm_fault *vmf);
 extern qsize_t *ext4_get_reserved_space(struct inode *inode);
 extern int ext4_get_projid(struct inode *inode, kprojid_t *projid);
 extern void ext4_da_update_reserve_space(struct inode *inode,
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 95bc48f5c88b..fe49045a2832 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -6106,27 +6106,27 @@ static int ext4_bh_unmapped(handle_t *handle, struct 
buffer_head *bh)
return !buffer_mapped(bh);
 }
 
-int ext4_page_mkwrite(struct vm_fault *vmf)
+vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf)
 {
struct vm_area_struct *vma = vmf->vma;
struct page *page = vmf->page;
loff_t size;
unsigned long len;
-   int ret;
+   vm_fault_t ret;
struct file *file = vma->vm_file;
struct inode *inode = file_inode(file);
struct address_space *mapping = inode->i_mapping;
handle_t *handle;
get_block_t *get_block;
-   int retries = 0;
+   int retries = 0, err;
 
sb_start_pagefault(inode->i_sb);
file_update_time(vma->vm_file);
 
down_read(_I(inode)->i_mmap_sem);
 
-   ret = ext4_convert_inline_data(inode);
-   if (ret)
+   err = ext4_convert_inline_data(inode);
+   if (err)
goto out_ret;
 
/* Delalloc case is easy... */
@@ -6134,9 +6134,9 @@ int ext4_page_mkwrite(struct vm_fault *vmf)
!ext4_should_journal_data(inode) &&
!ext4_nonda_switch(inode->i_sb)) {
do {
-   ret = block_page_mkwrite(vma, vmf,
+   err = block_page_mkwrite(vma, vmf,
   ext4_da_get_block_prep);
-   } while (ret == -ENOSPC &&
+   } while (err == -ENOSPC &&
   ext4_should_retry_alloc(inode->i_sb, ));
goto out_ret;
}
@@ -6181,8 +6181,8 @@ int ext4_page_mkwrite(struct vm_fault *vmf)
ret = VM_FAULT_SIGBUS;
goto out;
}
-   ret = block_page_mkwrite(vma, vmf, get_block);
-   if (!ret && ext4_should_journal_data(inode)) {
+   err = block_page_mkwrite(vma, vmf, get_block);
+   if (!err && ext4_should_journal_data(inode)) {
if (ext4_walk_page_buffers(handle, page_buffers(page), 0,
  PAGE_SIZE, NULL, do_journal_get_write_access)) {
unlock_page(page);
@@ -6193,24 +6193,24 @@ int ext4_page_mkwrite(struct vm_fault *vmf)
ext4_set_inode_state(inode, EXT4_STATE_JDATA);
}
ext4_journal_stop(handle);
-   if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, ))
+   if (err == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, ))
goto retry_alloc;
 out_ret:
-   ret = block_page_mkwrite_return(ret);
+   ret = block_page_mkwrite_return(err);
 out:
up_read(_I(inode)->i_mmap_sem);
sb_end_pagefault(inode->i_sb);
return ret;
 }
 
-int ext4_filemap_fault(struct vm_fault *vmf)
+vm_fault_t ext4_filemap_fault(struct vm_fault *vmf)
 {
struct inode *inode = file_inode(vmf->vma->vm_file);
-   int err;
+   vm_fault_t ret;
 
down_read(_I(inode)->i_mmap_sem);
-   err = filemap_fault(vmf);
+   ret = filemap_fault(vmf);
up_read(_I(inode)->i_mmap_sem);
 
-   return err;
+   return ret;
 }
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/14] vgem: separate errno from VM_FAULT_* values

2018-05-15 Thread Christoph Hellwig
And streamline the code in vgem_fault with early returns so that it is
a little bit more readable.

Signed-off-by: Christoph Hellwig <h...@lst.de>
---
 drivers/gpu/drm/vgem/vgem_drv.c | 51 +++--
 1 file changed, 23 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/vgem/vgem_drv.c b/drivers/gpu/drm/vgem/vgem_drv.c
index 2524ff116f00..a261e0aab83a 100644
--- a/drivers/gpu/drm/vgem/vgem_drv.c
+++ b/drivers/gpu/drm/vgem/vgem_drv.c
@@ -61,12 +61,13 @@ static void vgem_gem_free_object(struct drm_gem_object *obj)
kfree(vgem_obj);
 }
 
-static int vgem_gem_fault(struct vm_fault *vmf)
+static vm_fault_t vgem_gem_fault(struct vm_fault *vmf)
 {
struct vm_area_struct *vma = vmf->vma;
struct drm_vgem_gem_object *obj = vma->vm_private_data;
/* We don't use vmf->pgoff since that has the fake offset */
unsigned long vaddr = vmf->address;
+   struct page *page;
int ret;
loff_t num_pages;
pgoff_t page_offset;
@@ -85,35 +86,29 @@ static int vgem_gem_fault(struct vm_fault *vmf)
ret = 0;
}
mutex_unlock(>pages_lock);
-   if (ret) {
-   struct page *page;
-
-   page = shmem_read_mapping_page(
-   file_inode(obj->base.filp)->i_mapping,
-   page_offset);
-   if (!IS_ERR(page)) {
-   vmf->page = page;
-   ret = 0;
-   } else switch (PTR_ERR(page)) {
-   case -ENOSPC:
-   case -ENOMEM:
-   ret = VM_FAULT_OOM;
-   break;
-   case -EBUSY:
-   ret = VM_FAULT_RETRY;
-   break;
-   case -EFAULT:
-   case -EINVAL:
-   ret = VM_FAULT_SIGBUS;
-   break;
-   default:
-   WARN_ON(PTR_ERR(page));
-   ret = VM_FAULT_SIGBUS;
-   break;
-   }
+   if (!ret)
+   return 0;
+
+   page = shmem_read_mapping_page(file_inode(obj->base.filp)->i_mapping,
+   page_offset);
+   if (!IS_ERR(page)) {
+   vmf->page = page;
+   return 0;
+   }
 
+   switch (PTR_ERR(page)) {
+   case -ENOSPC:
+   case -ENOMEM:
+   return VM_FAULT_OOM;
+   case -EBUSY:
+   return VM_FAULT_RETRY;
+   case -EFAULT:
+   case -EINVAL:
+   return VM_FAULT_SIGBUS;
+   default:
+   WARN_ON(PTR_ERR(page));
+   return VM_FAULT_SIGBUS;
}
-   return ret;
 }
 
 static const struct vm_operations_struct vgem_gem_vm_ops = {
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/14] ubifs: separate errno from VM_FAULT_* values

2018-05-15 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig <h...@lst.de>
---
 fs/ubifs/file.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index 1acb2ff505e6..7c1a2e1c3de5 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -1513,7 +1513,7 @@ static int ubifs_releasepage(struct page *page, gfp_t 
unused_gfp_flags)
  * mmap()d file has taken write protection fault and is being made writable.
  * UBIFS must ensure page is budgeted for.
  */
-static int ubifs_vm_page_mkwrite(struct vm_fault *vmf)
+static vm_fault_t ubifs_vm_page_mkwrite(struct vm_fault *vmf)
 {
struct page *page = vmf->page;
struct inode *inode = file_inode(vmf->vma->vm_file);
@@ -1521,6 +1521,7 @@ static int ubifs_vm_page_mkwrite(struct vm_fault *vmf)
struct timespec now = current_time(inode);
struct ubifs_budget_req req = { .new_page = 1 };
int err, update_time;
+   vm_fault_t ret = 0;
 
dbg_gen("ino %lu, pg %lu, i_size %lld", inode->i_ino, page->index,
i_size_read(inode));
@@ -1601,8 +1602,8 @@ static int ubifs_vm_page_mkwrite(struct vm_fault *vmf)
unlock_page(page);
ubifs_release_budget(c, );
if (err)
-   err = VM_FAULT_SIGBUS;
-   return err;
+   ret = VM_FAULT_SIGBUS;
+   return ret;
 }
 
 static const struct vm_operations_struct ubifs_file_vm_ops = {
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/14] ttm: separate errno from VM_FAULT_* values

2018-05-15 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig <h...@lst.de>
---
 drivers/gpu/drm/ttm/ttm_bo_vm.c | 42 +
 1 file changed, 22 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index 8eba95b3c737..255e7801f62c 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -43,10 +43,11 @@
 
 #define TTM_BO_VM_NUM_PREFAULT 16
 
-static int ttm_bo_vm_fault_idle(struct ttm_buffer_object *bo,
+static vm_fault_t ttm_bo_vm_fault_idle(struct ttm_buffer_object *bo,
struct vm_fault *vmf)
 {
-   int ret = 0;
+   vm_fault_t ret = 0;
+   int err = 0;
 
if (likely(!bo->moving))
goto out_unlock;
@@ -77,8 +78,8 @@ static int ttm_bo_vm_fault_idle(struct ttm_buffer_object *bo,
/*
 * Ordinary wait.
 */
-   ret = dma_fence_wait(bo->moving, true);
-   if (unlikely(ret != 0)) {
+   err = dma_fence_wait(bo->moving, true);
+   if (unlikely(err != 0)) {
ret = (ret != -ERESTARTSYS) ? VM_FAULT_SIGBUS :
VM_FAULT_NOPAGE;
goto out_unlock;
@@ -104,7 +105,7 @@ static unsigned long ttm_bo_io_mem_pfn(struct 
ttm_buffer_object *bo,
+ page_offset;
 }
 
-static int ttm_bo_vm_fault(struct vm_fault *vmf)
+static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf)
 {
struct vm_area_struct *vma = vmf->vma;
struct ttm_buffer_object *bo = (struct ttm_buffer_object *)
@@ -115,7 +116,8 @@ static int ttm_bo_vm_fault(struct vm_fault *vmf)
unsigned long pfn;
struct ttm_tt *ttm = NULL;
struct page *page;
-   int ret;
+   vm_fault_t ret;
+   int err;
int i;
unsigned long address = vmf->address;
struct ttm_mem_type_manager *man =
@@ -128,9 +130,9 @@ static int ttm_bo_vm_fault(struct vm_fault *vmf)
 * for reserve, and if it fails, retry the fault after waiting
 * for the buffer to become unreserved.
 */
-   ret = ttm_bo_reserve(bo, true, true, NULL);
-   if (unlikely(ret != 0)) {
-   if (ret != -EBUSY)
+   err = ttm_bo_reserve(bo, true, true, NULL);
+   if (unlikely(err != 0)) {
+   if (err != -EBUSY)
return VM_FAULT_NOPAGE;
 
if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) {
@@ -162,8 +164,8 @@ static int ttm_bo_vm_fault(struct vm_fault *vmf)
}
 
if (bdev->driver->fault_reserve_notify) {
-   ret = bdev->driver->fault_reserve_notify(bo);
-   switch (ret) {
+   err = bdev->driver->fault_reserve_notify(bo);
+   switch (err) {
case 0:
break;
case -EBUSY:
@@ -191,13 +193,13 @@ static int ttm_bo_vm_fault(struct vm_fault *vmf)
goto out_unlock;
}
 
-   ret = ttm_mem_io_lock(man, true);
-   if (unlikely(ret != 0)) {
+   err = ttm_mem_io_lock(man, true);
+   if (unlikely(err != 0)) {
ret = VM_FAULT_NOPAGE;
goto out_unlock;
}
-   ret = ttm_mem_io_reserve_vm(bo);
-   if (unlikely(ret != 0)) {
+   err = ttm_mem_io_reserve_vm(bo);
+   if (unlikely(err != 0)) {
ret = VM_FAULT_SIGBUS;
goto out_io_unlock;
}
@@ -265,21 +267,21 @@ static int ttm_bo_vm_fault(struct vm_fault *vmf)
}
 
if (vma->vm_flags & VM_MIXEDMAP)
-   ret = vm_insert_mixed(, address,
+   err = vm_insert_mixed(, address,
__pfn_to_pfn_t(pfn, PFN_DEV));
else
-   ret = vm_insert_pfn(, address, pfn);
+   err = vm_insert_pfn(, address, pfn);
 
/*
 * Somebody beat us to this PTE or prefaulting to
 * an already populated PTE, or prefaulting error.
 */
 
-   if (unlikely((ret == -EBUSY) || (ret != 0 && i > 0)))
+   if (unlikely((err == -EBUSY) || (err != 0 && i > 0)))
break;
-   else if (unlikely(ret != 0)) {
+   else if (unlikely(err != 0)) {
ret =
-   (ret == -ENOMEM) ? VM_FAULT_OOM : VM_FAULT_SIGBUS;
+   (err == -ENOMEM) ? VM_FAULT_OOM : VM_FAULT_SIGBUS;
goto out_io_unlock;
}
 
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 12/14] lustre: separate errno from VM_FAULT_* values

2018-05-15 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig <h...@lst.de>
---
 .../staging/lustre/lustre/llite/llite_mmap.c  | 37 +++
 .../lustre/lustre/llite/vvp_internal.h|  2 +-
 2 files changed, 14 insertions(+), 25 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/llite_mmap.c 
b/drivers/staging/lustre/lustre/llite/llite_mmap.c
index 214b07554e62..061d98871959 100644
--- a/drivers/staging/lustre/lustre/llite/llite_mmap.c
+++ b/drivers/staging/lustre/lustre/llite/llite_mmap.c
@@ -231,23 +231,18 @@ static int ll_page_mkwrite0(struct vm_area_struct *vma, 
struct page *vmpage,
return result;
 }
 
-static inline int to_fault_error(int result)
+static inline vm_fault_t to_fault_error(int result)
 {
switch (result) {
case 0:
-   result = VM_FAULT_LOCKED;
-   break;
+   return VM_FAULT_LOCKED;
case -EFAULT:
-   result = VM_FAULT_NOPAGE;
-   break;
+   return VM_FAULT_NOPAGE;
case -ENOMEM:
-   result = VM_FAULT_OOM;
-   break;
+   return VM_FAULT_OOM;
default:
-   result = VM_FAULT_SIGBUS;
-   break;
+   return VM_FAULT_SIGBUS;
}
-   return result;
 }
 
 /**
@@ -261,7 +256,7 @@ static inline int to_fault_error(int result)
  * \retval VM_FAULT_ERROR on general error
  * \retval NOPAGE_OOM not have memory for allocate new page
  */
-static int ll_fault0(struct vm_area_struct *vma, struct vm_fault *vmf)
+static vm_fault_t ll_fault0(struct vm_area_struct *vma, struct vm_fault *vmf)
 {
struct lu_env  *env;
struct cl_io*io;
@@ -269,7 +264,7 @@ static int ll_fault0(struct vm_area_struct *vma, struct 
vm_fault *vmf)
struct page  *vmpage;
unsigned long   ra_flags;
int   result = 0;
-   int   fault_ret = 0;
+   vm_fault_tfault_ret = 0;
u16 refcheck;
 
env = cl_env_get();
@@ -323,7 +318,7 @@ static int ll_fault0(struct vm_area_struct *vma, struct 
vm_fault *vmf)
return fault_ret;
 }
 
-static int ll_fault(struct vm_fault *vmf)
+static vm_fault_t ll_fault(struct vm_fault *vmf)
 {
int count = 0;
bool printed = false;
@@ -364,7 +359,7 @@ static int ll_fault(struct vm_fault *vmf)
return result;
 }
 
-static int ll_page_mkwrite(struct vm_fault *vmf)
+static vm_fault_t ll_page_mkwrite(struct vm_fault *vmf)
 {
struct vm_area_struct *vma = vmf->vma;
int count = 0;
@@ -390,22 +385,16 @@ static int ll_page_mkwrite(struct vm_fault *vmf)
switch (result) {
case 0:
LASSERT(PageLocked(vmf->page));
-   result = VM_FAULT_LOCKED;
-   break;
+   return VM_FAULT_LOCKED;
case -ENODATA:
case -EAGAIN:
case -EFAULT:
-   result = VM_FAULT_NOPAGE;
-   break;
+   return VM_FAULT_NOPAGE;
case -ENOMEM:
-   result = VM_FAULT_OOM;
-   break;
+   return VM_FAULT_OOM;
default:
-   result = VM_FAULT_SIGBUS;
-   break;
+   return VM_FAULT_SIGBUS;
}
-
-   return result;
 }
 
 /**
diff --git a/drivers/staging/lustre/lustre/llite/vvp_internal.h 
b/drivers/staging/lustre/lustre/llite/vvp_internal.h
index 7d3abb43584a..c194966a3d82 100644
--- a/drivers/staging/lustre/lustre/llite/vvp_internal.h
+++ b/drivers/staging/lustre/lustre/llite/vvp_internal.h
@@ -83,7 +83,7 @@ struct vvp_io {
/**
 * fault API used bitflags for return code.
 */
-   unsigned intft_flags;
+   vm_fault_tft_flags;
/**
 * check that flags are from filemap_fault
 */
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 13/14] mm: move arch specific VM_FAULT_* flags to mm.h

2018-05-15 Thread Christoph Hellwig
Various architectures define their own internal flags.  Not sure a public
header like mm.h is a good place, but keeping them inside the arch code
with possible conflicts also seems like a bad idea.  Maybe we just need
to stop overloading the value instead.

Signed-off-by: Christoph Hellwig <h...@lst.de>
---
 arch/arm/mm/fault.c   | 3 ---
 arch/arm64/mm/fault.c | 3 ---
 arch/s390/mm/fault.c  | 6 --
 arch/unicore32/mm/fault.c | 3 ---
 include/linux/mm.h| 7 +++
 5 files changed, 7 insertions(+), 15 deletions(-)

diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index 32034543f49c..b696eabccf60 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -201,9 +201,6 @@ void do_bad_area(unsigned long addr, unsigned int fsr, 
struct pt_regs *regs)
 }
 
 #ifdef CONFIG_MMU
-#define VM_FAULT_BADMAP0x01
-#define VM_FAULT_BADACCESS 0x02
-
 /*
  * Check that the permissions on the VMA allow for the fault which occurred.
  * If we encountered a write fault, we must have write permission, otherwise
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 91c53a7d2575..3d0b1f8eacce 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -318,9 +318,6 @@ static void do_bad_area(unsigned long addr, unsigned int 
esr, struct pt_regs *re
}
 }
 
-#define VM_FAULT_BADMAP0x01
-#define VM_FAULT_BADACCESS 0x02
-
 static int __do_page_fault(struct mm_struct *mm, unsigned long addr,
   unsigned int mm_flags, unsigned long vm_flags,
   struct task_struct *tsk)
diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index e074480d3598..48c781ae25d0 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -44,12 +44,6 @@
 #define __SUBCODE_MASK 0x0600
 #define __PF_RES_FIELD 0x8000ULL
 
-#define VM_FAULT_BADCONTEXT0x01
-#define VM_FAULT_BADMAP0x02
-#define VM_FAULT_BADACCESS 0x04
-#define VM_FAULT_SIGNAL0x08
-#define VM_FAULT_PFAULT0x10
-
 enum fault_type {
KERNEL_FAULT,
USER_FAULT,
diff --git a/arch/unicore32/mm/fault.c b/arch/unicore32/mm/fault.c
index 381473412937..6c3c1a82925f 100644
--- a/arch/unicore32/mm/fault.c
+++ b/arch/unicore32/mm/fault.c
@@ -148,9 +148,6 @@ void do_bad_area(unsigned long addr, unsigned int fsr, 
struct pt_regs *regs)
__do_kernel_fault(mm, addr, fsr, regs);
 }
 
-#define VM_FAULT_BADMAP0x01
-#define VM_FAULT_BADACCESS 0x02
-
 /*
  * Check that the permissions on the VMA allow for the fault which occurred.
  * If we encountered a write fault, we must have write permission, otherwise
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 338b8a1afb02..64d09e3afc24 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1250,6 +1250,13 @@ static inline void clear_page_pfmemalloc(struct page 
*page)
 * and needs fsync() to complete (for
 * synchronous page faults in DAX) */
 
+/* Only for use in architecture specific page fault handling: */
+#define VM_FAULT_BADMAP0x01
+#define VM_FAULT_BADACCESS 0x02
+#define VM_FAULT_BADCONTEXT0x04
+#define VM_FAULT_SIGNAL0x08
+#define VM_FAULT_PFAULT0x10
+
 #define VM_FAULT_ERROR (VM_FAULT_OOM | VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV | \
 VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_LARGE | \
 VM_FAULT_FALLBACK)
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/14] ocfs2: separate errno from VM_FAULT_* values

2018-05-15 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig <h...@lst.de>
---
 fs/ocfs2/mmap.c | 36 +++-
 1 file changed, 19 insertions(+), 17 deletions(-)

diff --git a/fs/ocfs2/mmap.c b/fs/ocfs2/mmap.c
index fb9a20e3d608..e75c1fc5333e 100644
--- a/fs/ocfs2/mmap.c
+++ b/fs/ocfs2/mmap.c
@@ -44,11 +44,11 @@
 #include "ocfs2_trace.h"
 
 
-static int ocfs2_fault(struct vm_fault *vmf)
+static vm_fault_t ocfs2_fault(struct vm_fault *vmf)
 {
struct vm_area_struct *vma = vmf->vma;
sigset_t oldset;
-   int ret;
+   vm_fault_t ret;
 
ocfs2_block_signals();
ret = filemap_fault(vmf);
@@ -59,10 +59,10 @@ static int ocfs2_fault(struct vm_fault *vmf)
return ret;
 }
 
-static int __ocfs2_page_mkwrite(struct file *file, struct buffer_head *di_bh,
-   struct page *page)
+static vm_fault_t __ocfs2_page_mkwrite(struct file *file,
+   struct buffer_head *di_bh, struct page *page)
 {
-   int ret = VM_FAULT_NOPAGE;
+   vm_fault_t ret = VM_FAULT_NOPAGE;
struct inode *inode = file_inode(file);
struct address_space *mapping = inode->i_mapping;
loff_t pos = page_offset(page);
@@ -71,6 +71,7 @@ static int __ocfs2_page_mkwrite(struct file *file, struct 
buffer_head *di_bh,
struct page *locked_page = NULL;
void *fsdata;
loff_t size = i_size_read(inode);
+   int err;
 
last_index = (size - 1) >> PAGE_SHIFT;
 
@@ -105,12 +106,12 @@ static int __ocfs2_page_mkwrite(struct file *file, struct 
buffer_head *di_bh,
if (page->index == last_index)
len = ((size - 1) & ~PAGE_MASK) + 1;
 
-   ret = ocfs2_write_begin_nolock(mapping, pos, len, OCFS2_WRITE_MMAP,
+   err = ocfs2_write_begin_nolock(mapping, pos, len, OCFS2_WRITE_MMAP,
   _page, , di_bh, page);
-   if (ret) {
-   if (ret != -ENOSPC)
-   mlog_errno(ret);
-   if (ret == -ENOMEM)
+   if (err) {
+   if (err != -ENOSPC)
+   mlog_errno(err);
+   if (err == -ENOMEM)
ret = VM_FAULT_OOM;
else
ret = VM_FAULT_SIGBUS;
@@ -121,20 +122,21 @@ static int __ocfs2_page_mkwrite(struct file *file, struct 
buffer_head *di_bh,
ret = VM_FAULT_NOPAGE;
goto out;
}
-   ret = ocfs2_write_end_nolock(mapping, pos, len, len, fsdata);
-   BUG_ON(ret != len);
+   err = ocfs2_write_end_nolock(mapping, pos, len, len, fsdata);
+   BUG_ON(err != len);
ret = VM_FAULT_LOCKED;
 out:
return ret;
 }
 
-static int ocfs2_page_mkwrite(struct vm_fault *vmf)
+static vm_fault_t ocfs2_page_mkwrite(struct vm_fault *vmf)
 {
struct page *page = vmf->page;
struct inode *inode = file_inode(vmf->vma->vm_file);
struct buffer_head *di_bh = NULL;
sigset_t oldset;
-   int ret;
+   vm_fault_t ret = 0;
+   int err;
 
sb_start_pagefault(inode->i_sb);
ocfs2_block_signals();
@@ -144,10 +146,10 @@ static int ocfs2_page_mkwrite(struct vm_fault *vmf)
 * node. Taking the data lock will also ensure that we don't
 * attempt page truncation as part of a downconvert.
 */
-   ret = ocfs2_inode_lock(inode, _bh, 1);
-   if (ret < 0) {
+   err = ocfs2_inode_lock(inode, _bh, 1);
+   if (err < 0) {
mlog_errno(ret);
-   if (ret == -ENOMEM)
+   if (err == -ENOMEM)
ret = VM_FAULT_OOM;
else
ret = VM_FAULT_SIGBUS;
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 14/14] mm: turn on vm_fault_t type checking

2018-05-15 Thread Christoph Hellwig
Switch vm_fault_t to point to an unsigned int with __bіtwise annotations.
This both catches any old ->fault or ->page_mkwrite instance with plain
compiler type checking, as well as finding more intricate problems with
sparse.

Signed-off-by: Christoph Hellwig <h...@lst.de>
---
 arch/alpha/mm/fault.c |  2 +-
 arch/arc/mm/fault.c   |  3 +-
 arch/arm/mm/fault.c   |  5 +-
 arch/arm64/mm/fault.c |  7 +-
 arch/hexagon/mm/vm_fault.c|  2 +-
 arch/ia64/mm/fault.c  |  2 +-
 arch/m68k/mm/fault.c  |  2 +-
 arch/microblaze/mm/fault.c|  2 +-
 arch/mips/mm/fault.c  |  2 +-
 arch/nds32/mm/fault.c |  2 +-
 arch/nios2/mm/fault.c |  2 +-
 arch/openrisc/mm/fault.c  |  2 +-
 arch/parisc/mm/fault.c|  2 +-
 arch/powerpc/include/asm/copro.h  |  2 +-
 arch/powerpc/mm/copro_fault.c |  2 +-
 arch/powerpc/mm/fault.c   | 10 +--
 arch/powerpc/platforms/cell/spufs/fault.c |  2 +-
 arch/riscv/mm/fault.c |  3 +-
 arch/s390/kernel/vdso.c   |  2 +-
 arch/s390/mm/fault.c  |  2 +-
 arch/sh/mm/fault.c|  2 +-
 arch/sparc/mm/fault_32.c  |  4 +-
 arch/sparc/mm/fault_64.c  |  3 +-
 arch/um/kernel/trap.c |  2 +-
 arch/unicore32/mm/fault.c | 10 +--
 arch/x86/entry/vdso/vma.c |  4 +-
 arch/x86/mm/fault.c   | 11 +--
 arch/xtensa/mm/fault.c|  2 +-
 drivers/dax/device.c  | 21 +++---
 drivers/gpu/drm/drm_vm.c  | 10 +--
 drivers/gpu/drm/etnaviv/etnaviv_drv.h |  2 +-
 drivers/gpu/drm/etnaviv/etnaviv_gem.c |  2 +-
 drivers/gpu/drm/exynos/exynos_drm_gem.c   |  2 +-
 drivers/gpu/drm/exynos/exynos_drm_gem.h   |  2 +-
 drivers/gpu/drm/gma500/framebuffer.c  |  6 +-
 drivers/gpu/drm/gma500/gem.c  |  2 +-
 drivers/gpu/drm/gma500/psb_drv.h  |  2 +-
 drivers/gpu/drm/i915/i915_drv.h   |  2 +-
 drivers/gpu/drm/i915/i915_gem.c   | 21 ++
 drivers/gpu/drm/msm/msm_drv.h |  2 +-
 drivers/gpu/drm/msm/msm_gem.c |  2 +-
 drivers/gpu/drm/qxl/qxl_ttm.c |  4 +-
 drivers/gpu/drm/radeon/radeon_ttm.c   |  2 +-
 drivers/gpu/drm/udl/udl_drv.h |  2 +-
 drivers/gpu/drm/udl/udl_gem.c |  2 +-
 drivers/gpu/drm/vc4/vc4_bo.c  |  2 +-
 drivers/gpu/drm/vc4/vc4_drv.h |  2 +-
 drivers/hwtracing/intel_th/msu.c  |  2 +-
 drivers/iommu/amd_iommu_v2.c  |  2 +-
 drivers/iommu/intel-svm.c |  3 +-
 drivers/misc/cxl/fault.c  |  2 +-
 drivers/misc/ocxl/context.c   |  6 +-
 drivers/misc/ocxl/link.c  |  2 +-
 drivers/misc/ocxl/sysfs.c |  2 +-
 drivers/scsi/cxlflash/superpipe.c |  4 +-
 drivers/staging/ncpfs/mmap.c  |  2 +-
 drivers/xen/privcmd.c |  2 +-
 fs/9p/vfs_file.c  |  2 +-
 fs/afs/internal.h |  2 +-
 fs/afs/write.c|  2 +-
 fs/f2fs/file.c| 10 +--
 fs/fuse/file.c|  2 +-
 fs/gfs2/file.c|  2 +-
 fs/iomap.c|  2 +-
 fs/nfs/file.c |  4 +-
 fs/nilfs2/file.c  |  2 +-
 fs/proc/vmcore.c  |  2 +-
 fs/userfaultfd.c  |  4 +-
 fs/xfs/xfs_file.c | 12 ++--
 include/linux/huge_mm.h   | 13 ++--
 include/linux/hugetlb.h   |  2 +-
 include/linux/iomap.h |  4 +-
 include/linux/mm.h| 67 +
 include/linux/mm_types.h  |  5 +-
 include/linux/oom.h   |  2 +-
 include/linux/swapops.h   |  4 +-
 include/linux/userfaultfd_k.h |  5 +-
 ipc/shm.c |  2 +-
 kernel/events/core.c  |  4 +-
 mm/gup.c  |  7 +-
 mm/hmm.c  |  2 +-
 mm/huge_memory.c  | 29 
 mm/hugetlb.c  | 25 +++
 mm/internal.h |  2 +-
 mm/khugepaged.c   |  3 +-
 mm/ksm.c  |  2 +-
 mm/memory.c   | 88 ---
 mm/mmap.c |  4 +-
 mm/shmem.c|  9 +--
 samples/vfio-mdev/mbochs.c|  4 +-
 virt/kvm/kvm_main.c   |  2 

Re: btrfs space used issue

2018-03-05 Thread Christoph Hellwig
On Sat, Mar 03, 2018 at 06:59:26AM +, Duncan wrote:
> Indeed.  Preallocation with COW doesn't make the sense it does on an 
> overwrite-in-place filesystem.

It makes a whole lot of sense, it just is a little harder to implement.

There is no reason not to preallocate specific space, or if you aren't
forced to be fully log structured by the medium, specific blocks to
COW into.  It just isn't quite as trivial as for a rewrite in place
file system to implement.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/7] blk-mq: make blk_abort_request() trigger timeout path

2017-12-29 Thread Christoph Hellwig
On Sat, Dec 16, 2017 at 04:07:23AM -0800, Tejun Heo wrote:
> Note that this makes blk_abort_request() asynchronous - it initiates
> abortion but the actual termination will happen after a short while,
> even when the caller owns the request.  AFAICS, SCSI and ATA should be
> fine with that and I think mtip32xx and dasd should be safe but not
> completely sure.  It'd be great if people who know the drivers take a
> look.

For that you'll need to CC linux-ide and linux-scsi, and for the
SAS drivers some of the usual suspects that touch the SAS code.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/7] blk-mq: protect completion path with RCU

2017-12-29 Thread Christoph Hellwig
Why do you need the srcu protection?  The completion path can never
sleep.

If there is a good reason to keep it please add commment, and
make the srcu variant a separate function only used by drivers that
need it instead of adding the conditional.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHSET v3] blk-mq: reimplement timeout handling

2017-12-29 Thread Christoph Hellwig
This seems to miss the linux-block list once again.  Please include
it in the next resend.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qemu-kvm VM died during partial raid1 problems of btrfs

2017-09-12 Thread Christoph Hellwig
On Tue, Sep 12, 2017 at 08:43:59PM +0200, Adam Borowski wrote:
> For now, though, I wonder -- should we send fine folks at util-linux a patch
> to make fallocate -d restore mtime, either always or on an option?

Don't do that.  Please just add a new ioctl or fallocate command
that punches a hole if the range is zeroed, similar to what dedup
does.  It can probably even reuse a few helpers.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: do not reset bio->bi_ops while writing bio

2017-08-25 Thread Christoph Hellwig
On Thu, Aug 24, 2017 at 11:04:40AM -0700, Liu Bo wrote:
> On Thu, Aug 24, 2017 at 07:10:41AM -0700, Christoph Hellwig wrote:
> > On Wed, Aug 23, 2017 at 12:15:09PM -0600, Liu Bo wrote:
> > > flush_epd_write_bio() sets bio->bi_ops by itself to honor REQ_SYNC,
> > > but it's not needed at all since bio->bi_ops has set up properly in
> > > both __extent_writepage() and write_one_eb(), and in the case of
> > > write_one_eb(), it also sets REQ_META, which we will lose in
> > > flush_epd_write_bio().
> > > 
> > > This remove this unnecessary bio->bi_ops setting.
> > 
> > Nitpick: it's bi_op, not bi_ops.
> 
> Ah thanks, it's bio->bi_opf ;)

Heh.  Of course..
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: do not reset bio->bi_ops while writing bio

2017-08-24 Thread Christoph Hellwig
On Wed, Aug 23, 2017 at 12:15:09PM -0600, Liu Bo wrote:
> flush_epd_write_bio() sets bio->bi_ops by itself to honor REQ_SYNC,
> but it's not needed at all since bio->bi_ops has set up properly in
> both __extent_writepage() and write_one_eb(), and in the case of
> write_one_eb(), it also sets REQ_META, which we will lose in
> flush_epd_write_bio().
> 
> This remove this unnecessary bio->bi_ops setting.

Nitpick: it's bi_op, not bi_ops.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/6] raid5: remove a call to get_start_sect

2017-08-24 Thread Christoph Hellwig
On Wed, Aug 23, 2017 at 11:23:38AM -0700, Shaohua Li wrote:
> On Wed, Aug 23, 2017 at 07:10:28PM +0200, Christoph Hellwig wrote:
> > The block layer always remaps partitions before calling into the
> > ->make_request methods of drivers.  Thus the call to get_start_sect in
> > in_chunk_boundary will always return 0 and can be removed.
> > 
> > Signed-off-by: Christoph Hellwig <h...@lst.de>
> > ---
> >  drivers/md/raid5.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> > index 0fc2748aaf95..d687aeb1b538 100644
> > --- a/drivers/md/raid5.c
> > +++ b/drivers/md/raid5.c
> > @@ -5092,10 +5092,12 @@ static int raid5_congested(struct mddev *mddev, int 
> > bits)
> >  static int in_chunk_boundary(struct mddev *mddev, struct bio *bio)
> >  {
> > struct r5conf *conf = mddev->private;
> > -   sector_t sector = bio->bi_iter.bi_sector + get_start_sect(bio->bi_bdev);
> > +   sector_t sector = bio->bi_iter.bi_sector;
> > unsigned int chunk_sectors;
> > unsigned int bio_sectors = bio_sectors(bio);
> >  
> > +   WARN_ON_ONCE(bio->bi_partno);
> > +

Meh, of course bi_partno is only added a few patches later, so this
breaks bisectability.  But given that the patch is already in there's
probably nothing I can do here.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: don't require a struct block_device to submit a bio

2017-08-23 Thread Christoph Hellwig
Patch 6 didn't seem to make it through to the list, so here is a
link to the git tree:

http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/bi_bdev
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/6] block: add a __disk_get_part helper

2017-08-23 Thread Christoph Hellwig
This helper allows looking up a partion under RCU protection without
grabbing a reference to it.

Signed-off-by: Christoph Hellwig <h...@lst.de>
---
 block/blk.h   |  2 ++
 block/genhd.c | 26 +-
 2 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/block/blk.h b/block/blk.h
index 3a3d715bd725..fde8b351c166 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -204,6 +204,8 @@ static inline void elv_deactivate_rq(struct request_queue 
*q, struct request *rq
e->type->ops.sq.elevator_deactivate_req_fn(q, rq);
 }
 
+struct hd_struct *__disk_get_part(struct gendisk *disk, int partno);
+
 #ifdef CONFIG_FAIL_IO_TIMEOUT
 int blk_should_fake_timeout(struct request_queue *);
 ssize_t part_timeout_show(struct device *, struct device_attribute *, char *);
diff --git a/block/genhd.c b/block/genhd.c
index 3380a1e73ea0..713b7d4fe7a1 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -82,6 +82,15 @@ void part_in_flight(struct request_queue *q, struct 
hd_struct *part,
}
 }
 
+struct hd_struct *__disk_get_part(struct gendisk *disk, int partno)
+{
+   struct disk_part_tbl *ptbl = rcu_dereference(disk->part_tbl);
+
+   if (unlikely(partno < 0 || partno >= ptbl->len))
+   return NULL;
+   return rcu_dereference(ptbl->part[partno]);
+}
+
 /**
  * disk_get_part - get partition
  * @disk: disk to look partition from
@@ -98,21 +107,12 @@ void part_in_flight(struct request_queue *q, struct 
hd_struct *part,
  */
 struct hd_struct *disk_get_part(struct gendisk *disk, int partno)
 {
-   struct hd_struct *part = NULL;
-   struct disk_part_tbl *ptbl;
-
-   if (unlikely(partno < 0))
-   return NULL;
+   struct hd_struct *part;
 
rcu_read_lock();
-
-   ptbl = rcu_dereference(disk->part_tbl);
-   if (likely(partno < ptbl->len)) {
-   part = rcu_dereference(ptbl->part[partno]);
-   if (part)
-   get_device(part_to_dev(part));
-   }
-
+   part = __disk_get_part(disk, partno);
+   if (part)
+   get_device(part_to_dev(part));
rcu_read_unlock();
 
return part;
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/6] raid5: remove a call to get_start_sect

2017-08-23 Thread Christoph Hellwig
The block layer always remaps partitions before calling into the
->make_request methods of drivers.  Thus the call to get_start_sect in
in_chunk_boundary will always return 0 and can be removed.

Signed-off-by: Christoph Hellwig <h...@lst.de>
---
 drivers/md/raid5.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 0fc2748aaf95..d687aeb1b538 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -5092,10 +5092,12 @@ static int raid5_congested(struct mddev *mddev, int 
bits)
 static int in_chunk_boundary(struct mddev *mddev, struct bio *bio)
 {
struct r5conf *conf = mddev->private;
-   sector_t sector = bio->bi_iter.bi_sector + get_start_sect(bio->bi_bdev);
+   sector_t sector = bio->bi_iter.bi_sector;
unsigned int chunk_sectors;
unsigned int bio_sectors = bio_sectors(bio);
 
+   WARN_ON_ONCE(bio->bi_partno);
+
chunk_sectors = min(conf->chunk_sectors, conf->prev_chunk_sectors);
return  chunk_sectors >=
((sector & (chunk_sectors - 1)) + bio_sectors);
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/6] block: cache the partition index in struct block_device

2017-08-23 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig <h...@lst.de>
---
 fs/block_dev.c | 1 +
 include/linux/fs.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 9941dc8342df..d29d1c70f833 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1451,6 +1451,7 @@ static int __blkdev_get(struct block_device *bdev, 
fmode_t mode, int for_part)
bdev->bd_disk = disk;
bdev->bd_queue = disk->queue;
bdev->bd_contains = bdev;
+   bdev->bd_partno = partno;
 
if (!partno) {
ret = -ENXIO;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 6e1fd5d21248..706dd3a972d2 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -427,6 +427,7 @@ struct block_device {
 #endif
struct block_device *   bd_contains;
unsignedbd_block_size;
+   u8  bd_partno;
struct hd_struct *  bd_part;
/* number of times partitions within this device have been opened. */
unsignedbd_part_count;
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


don't require a struct block_device to submit a bio

2017-08-23 Thread Christoph Hellwig
Hi Jens,

this series removes the need to have a struct block_device ready to
submit a bio.   We basically don't use it anywhere in the block stack
anyway - we always go for the gendisk or request_queue.  The only
exception is partition remapping for which we'll now need an additional
partition index.  This helps with cases where we submit I/O from a
character device (nvme or lightnvm passthrough) or a different block
device (upcoming nvme multipath support).
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/6] btrfs: index check-integrity state hash by a dev_t

2017-08-23 Thread Christoph Hellwig
We won't have the struct block_device available in the bio soon, so switch
to the numerical dev_t instead of the block_device pointer for looking up
the check-integrity state.

Signed-off-by: Christoph Hellwig <h...@lst.de>
---
 fs/btrfs/check-integrity.c | 31 +--
 1 file changed, 13 insertions(+), 18 deletions(-)

diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index 11d37c94ce05..9d3854839038 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -296,8 +296,7 @@ static void btrfsic_dev_state_hashtable_add(
struct btrfsic_dev_state *ds,
struct btrfsic_dev_state_hashtable *h);
 static void btrfsic_dev_state_hashtable_remove(struct btrfsic_dev_state *ds);
-static struct btrfsic_dev_state *btrfsic_dev_state_hashtable_lookup(
-   struct block_device *bdev,
+static struct btrfsic_dev_state *btrfsic_dev_state_hashtable_lookup(dev_t dev,
struct btrfsic_dev_state_hashtable *h);
 static struct btrfsic_stack_frame *btrfsic_stack_frame_alloc(void);
 static void btrfsic_stack_frame_free(struct btrfsic_stack_frame *sf);
@@ -385,8 +384,7 @@ static int btrfsic_process_superblock_dev_mirror(
int superblock_mirror_num,
struct btrfsic_dev_state **selected_dev_state,
struct btrfs_super_block *selected_super);
-static struct btrfsic_dev_state *btrfsic_dev_state_lookup(
-   struct block_device *bdev);
+static struct btrfsic_dev_state *btrfsic_dev_state_lookup(dev_t dev);
 static void btrfsic_cmp_log_and_dev_bytenr(struct btrfsic_state *state,
   u64 bytenr,
   struct btrfsic_dev_state *dev_state,
@@ -626,17 +624,15 @@ static void btrfsic_dev_state_hashtable_remove(struct 
btrfsic_dev_state *ds)
list_del(>collision_resolving_node);
 }
 
-static struct btrfsic_dev_state *btrfsic_dev_state_hashtable_lookup(
-   struct block_device *bdev,
+static struct btrfsic_dev_state *btrfsic_dev_state_hashtable_lookup(dev_t dev,
struct btrfsic_dev_state_hashtable *h)
 {
const unsigned int hashval =
-   (((unsigned int)((uintptr_t)bdev)) &
-(BTRFSIC_DEV2STATE_HASHTABLE_SIZE - 1));
+   dev & (BTRFSIC_DEV2STATE_HASHTABLE_SIZE - 1);
struct btrfsic_dev_state *ds;
 
list_for_each_entry(ds, h->table + hashval, collision_resolving_node) {
-   if (ds->bdev == bdev)
+   if (ds->bdev->bd_dev == dev)
return ds;
}
 
@@ -668,7 +664,7 @@ static int btrfsic_process_superblock(struct btrfsic_state 
*state,
if (!device->bdev || !device->name)
continue;
 
-   dev_state = btrfsic_dev_state_lookup(device->bdev);
+   dev_state = btrfsic_dev_state_lookup(device->bdev->bd_dev);
BUG_ON(NULL == dev_state);
for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
ret = btrfsic_process_superblock_dev_mirror(
@@ -1556,7 +1552,7 @@ static int btrfsic_map_block(struct btrfsic_state *state, 
u64 bytenr, u32 len,
}
 
device = multi->stripes[0].dev;
-   block_ctx_out->dev = btrfsic_dev_state_lookup(device->bdev);
+   block_ctx_out->dev = btrfsic_dev_state_lookup(device->bdev->bd_dev);
block_ctx_out->dev_bytenr = multi->stripes[0].physical;
block_ctx_out->start = bytenr;
block_ctx_out->len = len;
@@ -2654,7 +2650,7 @@ static struct btrfsic_block *btrfsic_block_lookup_or_add(
pr_info("btrfsic: error, kmalloc failed!\n");
return NULL;
}
-   dev_state = btrfsic_dev_state_lookup(block_ctx->dev->bdev);
+   dev_state = 
btrfsic_dev_state_lookup(block_ctx->dev->bdev->bd_dev);
if (NULL == dev_state) {
pr_info("btrfsic: error, lookup dev_state failed!\n");
btrfsic_block_free(block);
@@ -2734,10 +2730,9 @@ static void btrfsic_cmp_log_and_dev_bytenr(struct 
btrfsic_state *state,
}
 }
 
-static struct btrfsic_dev_state *btrfsic_dev_state_lookup(
-   struct block_device *bdev)
+static struct btrfsic_dev_state *btrfsic_dev_state_lookup(dev_t dev)
 {
-   return btrfsic_dev_state_hashtable_lookup(bdev,
+   return btrfsic_dev_state_hashtable_lookup(dev,
  _dev_state_hashtable);
 }
 
@@ -2751,7 +2746,7 @@ int btrfsic_submit_bh(int op, int op_flags, struct 
buffer_head *bh)
mutex_lock(_mutex);
/* since btrfsic_submit_bh() might also be called before
 * btrfsic_mount(), this might return NULL */
-   dev_state = btrfsic_dev_state_lookup(bh->b_bdev);
+   dev_stat

[PATCH 3/6] block: reject attempts to allocate more than DISK_MAX_PARTS partitions

2017-08-23 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig <h...@lst.de>
---
 block/genhd.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/block/genhd.c b/block/genhd.c
index 2367087cdb7c..3380a1e73ea0 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1357,6 +1357,13 @@ struct gendisk *alloc_disk_node(int minors, int node_id)
struct gendisk *disk;
struct disk_part_tbl *ptbl;
 
+   if (minors > DISK_MAX_PARTS) {
+   printk(KERN_ERR
+   "block: can't allocated more than %d partitions\n",
+   DISK_MAX_PARTS);
+   minors = DISK_MAX_PARTS;
+   }
+
disk = kzalloc_node(sizeof(struct gendisk), GFP_KERNEL, node_id);
if (disk) {
if (!init_part_stats(>part0)) {
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?

2017-08-12 Thread Christoph Hellwig
On Sat, Aug 12, 2017 at 02:10:18AM +0200, Christoph Anton Mitterer wrote:
> Qu Wenruo wrote:
> >Although Btrfs can disable data CoW, nodatacow also disables data 
> >checksum, which is another main feature for btrfs.
> 
> Then decoupling of the two should probably decoupled and support for
> notdatacow+checksumming be implemented?!

And how are you going to write your data and checksum atomically when
doing in-place updates?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 11/49] btrfs: avoid access to .bi_vcnt directly

2017-08-10 Thread Christoph Hellwig
> +static unsigned int get_bio_pages(struct bio *bio)
> +{
> + unsigned i;
> + struct bio_vec *bv;
> +
> + bio_for_each_segment_all(bv, bio, i)
> + ;
> +
> + return i;
> +}

s/get_bio_pages/bio_nr_pages/ ?

Also this seems like a useful helper for bio.h
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Remove extra parentheses from condition in copy_items()

2017-07-27 Thread Christoph Hellwig
> - if ((i == (nr - 1)))
> + if (i == (nr - 1))

Please remove the other set of redundant parentheses as well if you
change that line.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] Btrfs for 4.13

2017-07-05 Thread Christoph Hellwig
On Wed, Jul 05, 2017 at 04:45:49PM -0700, Linus Torvalds wrote:
> Ouch, yeah, that was annoying. And yes, I ended up with several
> whitespace differences but other than that it looks the same. Oh well.
> 
> Jens, Christoph, let's not do that stupid thing with status-vs-error
> ever again, ok?

It should be done now :)

Back to serious - this work had to be done as we could not communicate
errors properly in the block layer with people inventing their own
codes all the time.  That being said, in retrospective I should have
tried a gradual approach that first defines the new code to the old
errors and left the fields in place.  I opted for the hard break to
make sure we can catch matching conversions by the compiler as very
few people run sparse for the __bitwise annotations, and some of
the drivers create so many sparse warnings that they would be lost
anyway.

But I've learned and will try to make something like this gradual
if it comes up, and hope we won't have to do something like it again
at all.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [proposal] making filesystem tools more machine friendly

2017-07-05 Thread Christoph Hellwig
Hi Jan,

I really like the idea out json output for fs/block tools.

nvme-cli has been doing it for a while, and that has been very
useful.  It comes with it's own little handrolled json output
helpers:

https://github.com/linux-nvme/nvme-cli/blob/master/json.c

so it doesn't introduce any new dependencies.  If you want to parse
input as well a library might be a better approach, though.

I really don't like the shared object evms model mention in the thread,
as that creates hard ELF dependencies.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 17/18] xfs: minimal conversion to errseq_t writeback error reporting

2017-06-30 Thread Christoph Hellwig
On Fri, Jun 30, 2017 at 12:45:54PM -0400, Jeff Layton wrote:
> Should I aim to do that with an individual patch for each fs, or is it
> better to do a swath of them all at once in a single patch here?

I'd be perfectly happy with one big patch for all the trivial
conversions.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 03/18] fs: check for writeback errors after syncing out buffers in generic_file_fsync

2017-06-29 Thread Christoph Hellwig
On Thu, Jun 29, 2017 at 09:19:39AM -0400, jlay...@kernel.org wrote:
> From: Jeff Layton 
> 
> ext2 currently does a test+clear of the AS_EIO flag, which is
> is problematic for some coming changes.
> 
> What we really need to do instead is call filemap_check_errors
> in __generic_file_fsync after syncing out the buffers. That
> will be sufficient for this case, and help other callers detect
> these errors properly as well.
> 
> With that, we don't need to twiddle it in ext2.

Seems like much of this code is getting replaced later in the
series..
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 14/18] block: convert to errseq_t based writeback error tracking

2017-06-29 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig <h...@lst.de>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 18/18] btrfs: minimal conversion to errseq_t writeback error reporting on fsync

2017-06-29 Thread Christoph Hellwig
On Thu, Jun 29, 2017 at 09:19:54AM -0400, jlay...@kernel.org wrote:
> From: Jeff Layton 
> 
> Just check and advance the errseq_t in the file before returning.
> Internal callers of filemap_* functions are left as-is.
> 
> Signed-off-by: Jeff Layton 
> ---
>  fs/btrfs/file.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index da1096eb1a40..1f57e1a523d9 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -2011,7 +2011,7 @@ int btrfs_sync_file(struct file *file, loff_t start, 
> loff_t end, int datasync)
>   struct btrfs_root *root = BTRFS_I(inode)->root;
>   struct btrfs_trans_handle *trans;
>   struct btrfs_log_ctx ctx;
> - int ret = 0;
> + int ret = 0, err;
>   bool full_sync = 0;
>   u64 len;
>  
> @@ -2030,7 +2030,7 @@ int btrfs_sync_file(struct file *file, loff_t start, 
> loff_t end, int datasync)
>*/
>   ret = start_ordered_ops(inode, start, end);
>   if (ret)
> - return ret;
> + goto out;
>  
>   inode_lock(inode);
>   atomic_inc(>log_batch);
> @@ -2227,6 +2227,9 @@ int btrfs_sync_file(struct file *file, loff_t start, 
> loff_t end, int datasync)
>   ret = btrfs_end_transaction(trans);
>   }
>  out:
> + err = file_check_and_advance_wb_err(file);
> + if (!ret)
> + ret = err;
>   return ret > 0 ? -EIO : ret;

This means that we'll lose the exact error returned from
start_ordered_ops.  Beyond that I can't really provide good feedback
as the btrfs fsync code looks so much different from all the other
fs fsync code..
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 17/18] xfs: minimal conversion to errseq_t writeback error reporting

2017-06-29 Thread Christoph Hellwig
Nice and simple, this looks great!

Reviewed-by: Christoph Hellwig <h...@lst.de>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 16/18] ext4: use errseq_t based error handling for reporting data writeback errors

2017-06-29 Thread Christoph Hellwig
> - if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb
> - return -EIO;
> + if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb {
> + ret = -EIO;
> + goto out;
> + }

This just seems to add a call to trace_ext4_sync_file_exit for this
case, which seems unrelated to the patch.

>   if (ret)
> - return ret;
> + goto out;
> +

Same here.

>   /*
>* data=writeback,ordered:
>*  The caller's filemap_fdatawrite()/wait will sync the data.
> @@ -152,7 +155,7 @@ int ext4_sync_file(struct file *file, loff_t start, 
> loff_t end, int datasync)
>   needs_barrier = true;
>   ret = jbd2_complete_transaction(journal, commit_tid);
>   if (needs_barrier) {
> - issue_flush:
> +issue_flush:
>   err = blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL);

And while I much prefer your new label placement it also doesn't
seem to belong into this patch.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 16/22] block: convert to errseq_t based writeback error tracking

2017-06-27 Thread Christoph Hellwig
On Mon, Jun 26, 2017 at 10:34:18AM -0400, Jeff Layton wrote:
> The bigger question is -- what about more complex filesystems like
> ext4?  There are a couple of cases where we can return -EIO or -EROFS on
> fsync before filemap_write_and_wait_range is ever called. Like this one
> for instance:
> 
> if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb
> return -EIO;
> 
> ...and the EXT4_MF_FS_ABORTED case.
> 
> Are those conditions ever recoverable, such that a later fsync could
> succeed? IOW, could I do a remount or something such that the existing
> fds are left open and become usable again? 

This looks copied from the xfs forced shutdown code, and in that
case it's final and permanent - you'll need an unmount to
clear it.

> If so, then we really ought to advance the errseq_t in the file when we
> catch those cases as well. If we have to do that, then it probably makes
> sense to leave the ext4 patch as-is.

I think it can switch to the new file helper.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 16/22] block: convert to errseq_t based writeback error tracking

2017-06-24 Thread Christoph Hellwig
On Tue, Jun 20, 2017 at 01:44:44PM -0400, Jeff Layton wrote:
> In order to query for errors with errseq_t, you need a previously-
> sampled point from which to check. When you call
> filemap_write_and_wait_range though you don't have a struct file and so
> no previously-sampled value.

So can we simply introduce variants of them that take a struct file?
That would be:

 a) less churn
 b) less code
 c) less chance to get data integrity wrong
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 16/22] block: convert to errseq_t based writeback error tracking

2017-06-20 Thread Christoph Hellwig
>   error = filemap_write_and_wait_range(filp->f_mapping, start, end);
>   if (error)
> - return error;
> + goto out;
>  
>   /*
>* There is no need to serialise calls to blkdev_issue_flush with
> @@ -640,6 +640,10 @@ int blkdev_fsync(struct file *filp, loff_t start, loff_t 
> end, int datasync)
>   if (error == -EOPNOTSUPP)
>   error = 0;
>  
> +out:
> + wberr = filemap_report_wb_err(filp);
> + if (!error)
> + error = wberr;

Just curious: what's the reason filemap_write_and_wait_range couldn't
query for the error using filemap_report_wb_err internally?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 11/22] fs: new infrastructure for writeback error handling and reporting

2017-06-20 Thread Christoph Hellwig
> @@ -393,6 +394,7 @@ struct address_space {
>   gfp_t   gfp_mask;   /* implicit gfp mask for 
> allocations */
>   struct list_headprivate_list;   /* ditto */
>   void*private_data;  /* ditto */
> + errseq_twb_err;
>  } __attribute__((aligned(sizeof(long;
>   /*
>* On most architectures that alignment is already the case; but
> @@ -847,6 +849,7 @@ struct file {
>* Must not be taken from IRQ context.
>*/
>   spinlock_t  f_lock;
> + errseq_tf_wb_err;
>   atomic_long_t   f_count;
>   unsigned intf_flags;
>   fmode_t f_mode;

Did you check the sizes of the structure before and after?
These places don't look like holes in the packing, but there probably
are some available.

> +static inline int filemap_check_wb_err(struct address_space *mapping, 
> errseq_t since)

Overly long line here (the patch has a few more)

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 01/22] fs: remove call_fsync helper function

2017-06-20 Thread Christoph Hellwig
On Fri, Jun 16, 2017 at 03:34:06PM -0400, Jeff Layton wrote:
> Requested-by: Christoph Hellwig <h...@infradead.org>
> Signed-off-by: Jeff Layton <jlay...@redhat.com>

Looks good,

Reviewed-by: Christoph Hellwig <h...@lst.de>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: use new block error code

2017-06-20 Thread Christoph Hellwig
On Mon, Jun 19, 2017 at 01:55:37PM +0300, Dan Carpenter wrote:
> This function is supposed to return blk_status_t error codes now but
> there was a stray -ENOMEM left behind.
> 
> Fixes: 4e4cbee93d56 ("block: switch bios to blk_status_t")
> Signed-off-by: Dan Carpenter <dan.carpen...@oracle.com>

Looks fine,

Acked-by: Christoph Hellwig <h...@lst.de>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 12/20] fs: add a new fstype flag to indicate how writeback errors are tracked

2017-06-15 Thread Christoph Hellwig
On Thu, Jun 15, 2017 at 06:42:12AM -0400, Jeff Layton wrote:
> Correct.
> 
> But if there is a data writeback error, should we report an error on all
> open fds at that time (like we will for fsync)?

We should in theory, but I don't see how to properly do it.  In addition
sync_file_range just can't be used for data integrity to start with, so
I don't think it's worth it.  At some point we should add a proper
fsync_range syscall, though.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/10 v11] No wait AIO

2017-06-15 Thread Christoph Hellwig
On Thu, Jun 15, 2017 at 12:11:58PM +0100, Al Viro wrote:
> Which flags are you talking about?  aio ones?  AFAICS, it's the same
> kind of thing as "can we lseek?" or "can we read/pread?", etc.
> What would that field look like?  Note that some of those might depend
> upon the flags passed to open(), so shoving them into file_operations
> might mean splitting file_operations instances...  Details, please.

Yeah, I'll take my suggestion back.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 12/20] fs: add a new fstype flag to indicate how writeback errors are tracked

2017-06-15 Thread Christoph Hellwig
On Wed, Jun 14, 2017 at 01:24:43PM -0400, Jeff Layton wrote:
> In this smaller set, it's only really used for DAX.

DAX only is implemented by three filesystems, please just fix them
up in one go.

> sync_file_range: ->fsync isn't called directly there, and I think we
> probably want similar semantics to fsync() for it

sync_file_range is only supposed to sync data, so it should not call
->fsync.

> JBD2: will try to re-set the error after clearing it with
> filemap_fdatawait. That's problematic with the new infrastructure so we
> need some way to avoid it.

JBD2 only has two users, please fix them up in one go.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 12/20] fs: add a new fstype flag to indicate how writeback errors are tracked

2017-06-14 Thread Christoph Hellwig
On Tue, Jun 13, 2017 at 06:24:32AM -0400, Jeff Layton wrote:
> That's definitely what I want for the endgame here. My plan was to add
> this flag for now, and then eventually reverse it (or drop it) once all
> or most filesystems are converted.
> 
> We can do it that way from the get-go if you like. It'll mean tossing in
>  a patch add this flag to all filesystems that have an fsync operation
> and that use the pagecache, and then gradually remove it from them as we
> convert them.
> 
> Which method do you prefer?

Please do it from the get-go.  Or in fact figure out if we can get
away without it entirely.  Moving the error reporting into ->fsync
should help greatly with that, so what's missing after that?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: use ino_t for inode in tracer

2017-06-13 Thread Christoph Hellwig
I thought btrfs supports 64-bit inodes, in which case this will
truncate the inode number on 32-bit architectures.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/10 v11] No wait AIO

2017-06-13 Thread Christoph Hellwig
On Mon, Jun 12, 2017 at 05:38:13PM -0500, Goldwyn Rodrigues wrote:
> We had FS_NOWAIT in filesystem type flags (in v3), but retracted it
> later in v4.

A per-fs flag is wrong as file_operation may have different
capabilities.

> I will work on adding FMODE_AIO_NOWAIT in the meantime.

If Al prefers that let's go with it for now.  One other thing I'd
love is a supported flags field in struct file_operations that we
can compare the flags against.  Al, does that sound good as well?

> 
> Thanks,
> 
> -- 
> Goldwyn
---end quoted text---
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 14/20] dax: set errors in mapping when writeback fails

2017-06-12 Thread Christoph Hellwig
On Mon, Jun 12, 2017 at 08:23:10AM -0400, Jeff Layton wrote:
> For now, only do this when the FS_WB_ERRSEQ flag is set. The
> AS_EIO/AS_ENOSPC flags are not currently cleared in the older code when
> writeback initiation fails, only when we discover an error after waiting
> on writeback to complete, so we only want to do this with errseq_t based
> error handling to prevent seeing duplicate errors on fsync.

Please make sure this doens't stay conditional by the end of the series.
We only have three file systems using dax, and a series should be able
to make them agree on a single interface.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 12/20] fs: add a new fstype flag to indicate how writeback errors are tracked

2017-06-12 Thread Christoph Hellwig
On Mon, Jun 12, 2017 at 08:23:06AM -0400, Jeff Layton wrote:
> Add a new FS_WB_ERRSEQ flag to the fstype. Later patches will set and
> key off of that to decide what behavior should be used.

Please invert this so that only file systems that keep the old semantics
need a flag.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 15/20] fs: have call_fsync call filemap_report_wb_err if FS_WB_ERRSEQ is set

2017-06-12 Thread Christoph Hellwig
On Mon, Jun 12, 2017 at 08:23:11AM -0400, Jeff Layton wrote:
> Allow filesystems to opt-in to a final check of wb_err if FS_WB_ERRSEQ
> is set. Technically, we could just plumb these calls into all of the
> fsync operations, but I think this means less code, changes and churn.

Please add it to every fs, that is a consistent with how we handle
everything else related to writeback.

Oh, and please kill this idiotic call_fsync helper while you're at it.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/10 v11] No wait AIO

2017-06-08 Thread Christoph Hellwig
As already indicated this whole series looks fine to me.

Al: are you going to pick this up?  Or Andrew?

On Tue, Jun 06, 2017 at 06:19:29AM -0500, Goldwyn Rodrigues wrote:
> This series adds nonblocking feature to asynchronous I/O writes.
> io_submit() can be delayed because of a number of reason:
>  - Block allocation for files
>  - Data writebacks for direct I/O
>  - Sleeping because of waiting to acquire i_rwsem
>  - Congested block device
> 
> The goal of the patch series is to return -EAGAIN/-EWOULDBLOCK if
> any of these conditions are met. This way userspace can push most
> of the write()s to the kernel to the best of its ability to complete
> and if it returns -EAGAIN, can defer it to another thread.
> 
> In order to enable this, IOCB_RW_FLAG_NOWAIT is introduced in
> uapi/linux/aio_abi.h. If set for aio_rw_flags, it translates to
> IOCB_NOWAIT for struct iocb, REQ_NOWAIT for bio.bi_opf and IOMAP_NOWAIT for
> iomap. aio_rw_flags is a new flag replacing aio_reserved1. We could
> not use aio_flags because it is not currently checked for invalidity
> in the kernel.
> 
> This feature is provided for direct I/O of asynchronous I/O only. I have
> tested it against xfs, ext4, and btrfs while I intend to add more filesystems.
> The nowait feature is for request based devices. In the future, I intend to
> add support to stacked devices such as md.
> 
> Applications will have to check supportability
> by sending a async direct write and any other error besides -EAGAIN
> would mean it is not supported.
> 
> First two patches are prep patches into nowait I/O.
> 
> Changes since v1:
>  + changed name from _NONBLOCKING to *_NOWAIT
>  + filemap_range_has_page call moved to closer to (just before) calling 
> filemap_write_and_wait_range().
>  + BIO_NOWAIT limited to get_request()
>  + XFS fixes 
>   - included reflink 
>   - use of xfs_ilock_nowait() instead of a XFS_IOLOCK_NONBLOCKING flag
>   - Translate the flag through IOMAP_NOWAIT (iomap) to check for
> block allocation for the file.
>  + ext4 coding style
> 
> Changes since v2:
>  + Using aio_reserved1 as aio_rw_flags instead of aio_flags
>  + blk-mq support
>  + xfs uptodate with kernel and reflink changes
> 
>  Changes since v3:
>   + Added FS_NOWAIT, which is set if the filesystem supports NOWAIT feature.
>   + Checks in generic_make_request() to make sure BIO_NOWAIT comes in
> for async direct writes only.
>   + Added QUEUE_FLAG_NOWAIT, which is set if the device supports BIO_NOWAIT.
> This is added (rather not set) to block devices such as dm/md currently.
> 
>  Changes since v4:
>   + Ported AIO code to use RWF_* flags. Check for RWF_* flags in
> generic_file_write_iter().
>   + Changed IOCB_RW_FLAGS_NOWAIT to RWF_NOWAIT.
> 
>  Changes since v5:
>   + BIO_NOWAIT to REQ_NOWAIT
>   + Common helper for RWF flags.
> 
>  Changes since v6:
>   + REQ_NOWAIT will be ignored for request based devices since they
> cannot block. So, removed QUEUE_FLAG_NOWAIT since it is not
> required in the current implementation. It will be resurrected
> when we program for stacked devices.
>   + changed kiocb_rw_flags() to kiocb_set_rw_flags() in order to accomodate
> for errors. Moved checks in the function.
> 
>  Changes since v7:
>   + split patches into prep so the main patches are smaller and easier
> to understand
>   + All patches are reviewed or acked!
>  
>  Changes since v8:
>  + Err out AIO reads with -EINVAL flagged as RWF_NOWAIT
> 
>  Changes since v9:
>  + Retract - Err out AIO reads with -EINVAL flagged as RWF_NOWAIT
>  + XFS returns EAGAIN if extent list is not in memory
>  + Man page updates to io_submit with iocb description and nowait features.
> 
>  Changes since v10:
>  + Corrected comment and subject in "return on congested block device"
> 
> -- 
> Goldwyn
> 
> 
---end quoted text---
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/8] btrfs: bioset allocations will never fail, adapt our helpers

2017-06-06 Thread Christoph Hellwig
On Tue, Jun 06, 2017 at 06:21:17PM +0800, Anand Jain wrote:
>
>
> On 06/03/17 00:58, David Sterba wrote:
>> Christoph pointed out that bio allocations backed by a bioset will never
>> fail.
>
> David,
>
>  Looks like this feature comes when  __GFP_DIRECT_RECLAIM is
>  set and we aren't, such as [1]. Any idea why? Looks like I am
>  missing something ?
>
>
> [1]
> -
> static int submit_extent_page(int op, int op_flags, struct extent_io_tree 
> *tree,
> ::
>
>  bio = btrfs_bio_alloc(bdev, sector, BIO_MAX_PAGES,
> GFP_NOFS | __GFP_HIGH);

#define __GFP_RECLAIM ((__force 
gfp_t)(___GFP_DIRECT_RECLAIM|___GFP_KSWAPD_RECLAIM))


#define GFP_NOFS(__GFP_RECLAIM | __GFP_IO)

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/13] fs: simplify dio_bio_complete

2017-06-03 Thread Christoph Hellwig
Only read bio->bi_error once in the common path.

Signed-off-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Bart Van Assche <bart.vanass...@sandisk.com>
---
 fs/direct-io.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/direct-io.c b/fs/direct-io.c
index 04247a6c3f73..bb711e4b86c2 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -477,13 +477,12 @@ static int dio_bio_complete(struct dio *dio, struct bio 
*bio)
 {
struct bio_vec *bvec;
unsigned i;
-   int err;
+   int err = bio->bi_error;
 
-   if (bio->bi_error)
+   if (err)
dio->io_error = -EIO;
 
if (dio->is_async && dio->op == REQ_OP_READ && dio->should_dirty) {
-   err = bio->bi_error;
bio_check_pages_dirty(bio); /* transfers ownership */
} else {
bio_for_each_segment_all(bvec, bio, i) {
@@ -494,7 +493,6 @@ static int dio_bio_complete(struct dio *dio, struct bio 
*bio)
set_page_dirty_lock(page);
put_page(page);
}
-   err = bio->bi_error;
bio_put(bio);
}
return err;
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/13] dm: change ->end_io calling convention

2017-06-03 Thread Christoph Hellwig
Turn the error paramter into a pointer so that target drivers can change
the value, and make sure only DM_ENDIO_* values are returned from the
methods.

Signed-off-by: Christoph Hellwig <h...@lst.de>
---
 drivers/md/dm-cache-target.c  |  4 ++--
 drivers/md/dm-flakey.c|  8 
 drivers/md/dm-log-writes.c|  4 ++--
 drivers/md/dm-mpath.c | 11 ++-
 drivers/md/dm-raid1.c | 14 +++---
 drivers/md/dm-snap.c  |  4 ++--
 drivers/md/dm-stripe.c| 14 +++---
 drivers/md/dm-thin.c  |  4 ++--
 drivers/md/dm.c   | 36 ++--
 include/linux/device-mapper.h |  2 +-
 10 files changed, 51 insertions(+), 50 deletions(-)

diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c
index d682a0511381..c48612e6d525 100644
--- a/drivers/md/dm-cache-target.c
+++ b/drivers/md/dm-cache-target.c
@@ -2820,7 +2820,7 @@ static int cache_map(struct dm_target *ti, struct bio 
*bio)
return r;
 }
 
-static int cache_end_io(struct dm_target *ti, struct bio *bio, int error)
+static int cache_end_io(struct dm_target *ti, struct bio *bio, int *error)
 {
struct cache *cache = ti->private;
unsigned long flags;
@@ -2838,7 +2838,7 @@ static int cache_end_io(struct dm_target *ti, struct bio 
*bio, int error)
bio_drop_shared_lock(cache, bio);
accounted_complete(cache, bio);
 
-   return 0;
+   return DM_ENDIO_DONE;
 }
 
 static int write_dirty_bitset(struct cache *cache)
diff --git a/drivers/md/dm-flakey.c b/drivers/md/dm-flakey.c
index e8f093b323ce..c9539917a59b 100644
--- a/drivers/md/dm-flakey.c
+++ b/drivers/md/dm-flakey.c
@@ -358,12 +358,12 @@ static int flakey_map(struct dm_target *ti, struct bio 
*bio)
return DM_MAPIO_REMAPPED;
 }
 
-static int flakey_end_io(struct dm_target *ti, struct bio *bio, int error)
+static int flakey_end_io(struct dm_target *ti, struct bio *bio, int *error)
 {
struct flakey_c *fc = ti->private;
struct per_bio_data *pb = dm_per_bio_data(bio, sizeof(struct 
per_bio_data));
 
-   if (!error && pb->bio_submitted && (bio_data_dir(bio) == READ)) {
+   if (!*error && pb->bio_submitted && (bio_data_dir(bio) == READ)) {
if (fc->corrupt_bio_byte && (fc->corrupt_bio_rw == READ) &&
all_corrupt_bio_flags_match(bio, fc)) {
/*
@@ -377,11 +377,11 @@ static int flakey_end_io(struct dm_target *ti, struct bio 
*bio, int error)
 * Error read during the down_interval if drop_writes
 * and error_writes were not configured.
 */
-   return -EIO;
+   *error = -EIO;
}
}
 
-   return error;
+   return DM_ENDIO_DONE;
 }
 
 static void flakey_status(struct dm_target *ti, status_type_t type,
diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c
index e42264706c59..cc57c7fa1268 100644
--- a/drivers/md/dm-log-writes.c
+++ b/drivers/md/dm-log-writes.c
@@ -664,7 +664,7 @@ static int log_writes_map(struct dm_target *ti, struct bio 
*bio)
return DM_MAPIO_REMAPPED;
 }
 
-static int normal_end_io(struct dm_target *ti, struct bio *bio, int error)
+static int normal_end_io(struct dm_target *ti, struct bio *bio, int *error)
 {
struct log_writes_c *lc = ti->private;
struct per_bio_data *pb = dm_per_bio_data(bio, sizeof(struct 
per_bio_data));
@@ -686,7 +686,7 @@ static int normal_end_io(struct dm_target *ti, struct bio 
*bio, int error)
spin_unlock_irqrestore(>blocks_lock, flags);
}
 
-   return error;
+   return DM_ENDIO_DONE;
 }
 
 /*
diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index bf6e49c780d5..ceeeb495d01c 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -1517,14 +1517,15 @@ static int multipath_end_io(struct dm_target *ti, 
struct request *clone,
return r;
 }
 
-static int multipath_end_io_bio(struct dm_target *ti, struct bio *clone, int 
error)
+static int multipath_end_io_bio(struct dm_target *ti, struct bio *clone, int 
*error)
 {
struct multipath *m = ti->private;
struct dm_mpath_io *mpio = get_mpio_from_bio(clone);
struct pgpath *pgpath = mpio->pgpath;
unsigned long flags;
+   int r = DM_ENDIO_DONE;
 
-   if (!error || noretry_error(error))
+   if (!*error || noretry_error(*error))
goto done;
 
if (pgpath)
@@ -1533,7 +1534,7 @@ static int multipath_end_io_bio(struct dm_target *ti, 
struct bio *clone, int err
if (atomic_read(>nr_valid_paths) == 0 &&
!test_bit(MPATHF_QUEUE_IF_NO_PATH, >flags)) {
dm_report_EIO(m);
-   error = -EIO;
+   *error = -EIO;
goto done;
}
 
@@ -1

[PATCH 12/13] blk-mq: switch ->queue_rq return value to blk_status_t

2017-06-03 Thread Christoph Hellwig
Use the same values for use for request completion errors as the return
value from ->queue_rq.  BLK_STS_RESOURCE is special cased to cause
a requeue, and all the others are completed as-is.

Signed-off-by: Christoph Hellwig <h...@lst.de>
---
 block/blk-mq.c| 37 --
 drivers/block/loop.c  |  6 +++---
 drivers/block/mtip32xx/mtip32xx.c | 17 
 drivers/block/nbd.c   | 12 ---
 drivers/block/null_blk.c  |  4 ++--
 drivers/block/rbd.c   |  4 ++--
 drivers/block/virtio_blk.c| 10 +-
 drivers/block/xen-blkfront.c  |  8 
 drivers/md/dm-rq.c|  8 
 drivers/mtd/ubi/block.c   |  6 +++---
 drivers/nvme/host/core.c  | 14 ++---
 drivers/nvme/host/fc.c| 23 +++--
 drivers/nvme/host/nvme.h  |  2 +-
 drivers/nvme/host/pci.c   | 42 +++
 drivers/nvme/host/rdma.c  | 26 +---
 drivers/nvme/target/loop.c| 17 
 drivers/scsi/scsi_lib.c   | 30 ++--
 include/linux/blk-mq.h|  7 ++-
 18 files changed, 131 insertions(+), 142 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index adcc1c0dce6e..7af78b1e9db9 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -924,7 +924,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, 
struct list_head *list)
 {
struct blk_mq_hw_ctx *hctx;
struct request *rq;
-   int errors, queued, ret = BLK_MQ_RQ_QUEUE_OK;
+   int errors, queued;
 
if (list_empty(list))
return false;
@@ -935,6 +935,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, 
struct list_head *list)
errors = queued = 0;
do {
struct blk_mq_queue_data bd;
+   blk_status_t ret;
 
rq = list_first_entry(list, struct request, queuelist);
if (!blk_mq_get_driver_tag(rq, , false)) {
@@ -975,25 +976,20 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, 
struct list_head *list)
}
 
ret = q->mq_ops->queue_rq(hctx, );
-   switch (ret) {
-   case BLK_MQ_RQ_QUEUE_OK:
-   queued++;
-   break;
-   case BLK_MQ_RQ_QUEUE_BUSY:
+   if (ret == BLK_STS_RESOURCE) {
blk_mq_put_driver_tag_hctx(hctx, rq);
list_add(>queuelist, list);
__blk_mq_requeue_request(rq);
break;
-   default:
-   pr_err("blk-mq: bad return on queue: %d\n", ret);
-   case BLK_MQ_RQ_QUEUE_ERROR:
+   }
+
+   if (unlikely(ret != BLK_STS_OK)) {
errors++;
blk_mq_end_request(rq, BLK_STS_IOERR);
-   break;
+   continue;
}
 
-   if (ret == BLK_MQ_RQ_QUEUE_BUSY)
-   break;
+   queued++;
} while (!list_empty(list));
 
hctx->dispatched[queued_to_index(queued)]++;
@@ -1031,7 +1027,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, 
struct list_head *list)
 * - blk_mq_run_hw_queue() checks whether or not a queue has
 *   been stopped before rerunning a queue.
 * - Some but not all block drivers stop a queue before
-*   returning BLK_MQ_RQ_QUEUE_BUSY. Two exceptions are scsi-mq
+*   returning BLK_STS_RESOURCE. Two exceptions are scsi-mq
 *   and dm-rq.
 */
if (!blk_mq_sched_needs_restart(hctx) &&
@@ -1410,7 +1406,7 @@ static void __blk_mq_try_issue_directly(struct request 
*rq, blk_qc_t *cookie,
};
struct blk_mq_hw_ctx *hctx;
blk_qc_t new_cookie;
-   int ret;
+   blk_status_t ret;
 
if (q->elevator)
goto insert;
@@ -1426,18 +1422,19 @@ static void __blk_mq_try_issue_directly(struct request 
*rq, blk_qc_t *cookie,
 * would have done
 */
ret = q->mq_ops->queue_rq(hctx, );
-   if (ret == BLK_MQ_RQ_QUEUE_OK) {
+   switch (ret) {
+   case BLK_STS_OK:
*cookie = new_cookie;
return;
-   }
-
-   if (ret == BLK_MQ_RQ_QUEUE_ERROR) {
+   case BLK_STS_RESOURCE:
+   __blk_mq_requeue_request(rq);
+   goto insert;
+   default:
*cookie = BLK_QC_T_NONE;
-   blk_mq_end_request(rq, BLK_STS_IOERR);
+   blk_mq_end_request(rq, ret);
return;
}
 
-   __blk_mq_requeue_request(rq);
 insert:
blk_mq_sched_insert_request(rq, false, true, false, may_sleep);
 }
diff --g

[PATCH 08/13] dm mpath: merge do_end_io_bio into multipath_end_io_bio

2017-06-03 Thread Christoph Hellwig
This simplifies the code and especially the error passing a bit and
will help with the next patch.

Signed-off-by: Christoph Hellwig <h...@lst.de>
---
 drivers/md/dm-mpath.c | 42 +++---
 1 file changed, 15 insertions(+), 27 deletions(-)

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 3df056b73b66..6d5ebb76149d 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -1510,24 +1510,24 @@ static int multipath_end_io(struct dm_target *ti, 
struct request *clone,
return r;
 }
 
-static int do_end_io_bio(struct multipath *m, struct bio *clone,
-int error, struct dm_mpath_io *mpio)
+static int multipath_end_io_bio(struct dm_target *ti, struct bio *clone, int 
error)
 {
+   struct multipath *m = ti->private;
+   struct dm_mpath_io *mpio = get_mpio_from_bio(clone);
+   struct pgpath *pgpath = mpio->pgpath;
unsigned long flags;
 
-   if (!error)
-   return 0;   /* I/O complete */
-
-   if (noretry_error(error))
-   return error;
+   if (!error || noretry_error(error))
+   goto done;
 
-   if (mpio->pgpath)
-   fail_path(mpio->pgpath);
+   if (pgpath)
+   fail_path(pgpath);
 
if (atomic_read(>nr_valid_paths) == 0 &&
!test_bit(MPATHF_QUEUE_IF_NO_PATH, >flags)) {
dm_report_EIO(m);
-   return -EIO;
+   error = -EIO;
+   goto done;
}
 
/* Queue for the daemon to resubmit */
@@ -1539,28 +1539,16 @@ static int do_end_io_bio(struct multipath *m, struct 
bio *clone,
if (!test_bit(MPATHF_QUEUE_IO, >flags))
queue_work(kmultipathd, >process_queued_bios);
 
-   return DM_ENDIO_INCOMPLETE;
-}
-
-static int multipath_end_io_bio(struct dm_target *ti, struct bio *clone, int 
error)
-{
-   struct multipath *m = ti->private;
-   struct dm_mpath_io *mpio = get_mpio_from_bio(clone);
-   struct pgpath *pgpath;
-   struct path_selector *ps;
-   int r;
-
-   BUG_ON(!mpio);
-
-   r = do_end_io_bio(m, clone, error, mpio);
-   pgpath = mpio->pgpath;
+   error = DM_ENDIO_INCOMPLETE;
+done:
if (pgpath) {
-   ps = >pg->ps;
+   struct path_selector *ps = >pg->ps;
+
if (ps->type->end_io)
ps->type->end_io(ps, >path, mpio->nr_bytes);
}
 
-   return r;
+   return error;
 }
 
 /*
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/13] nvme-lightnvm: use blk_execute_rq in nvme_nvm_submit_user_cmd

2017-06-03 Thread Christoph Hellwig
Instead of reinventing it poorly.

Signed-off-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Bart Van Assche <bart.vanass...@sandisk.com>
Reviewed-by: Javier González <jav...@cnexlabs.com>
---
 drivers/nvme/host/lightnvm.c | 12 +---
 1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
index f5df78ed1e10..f3885b5e56bd 100644
--- a/drivers/nvme/host/lightnvm.c
+++ b/drivers/nvme/host/lightnvm.c
@@ -571,13 +571,6 @@ static struct nvm_dev_ops nvme_nvm_dev_ops = {
.max_phys_sect  = 64,
 };
 
-static void nvme_nvm_end_user_vio(struct request *rq, int error)
-{
-   struct completion *waiting = rq->end_io_data;
-
-   complete(waiting);
-}
-
 static int nvme_nvm_submit_user_cmd(struct request_queue *q,
struct nvme_ns *ns,
struct nvme_nvm_command *vcmd,
@@ -608,7 +601,6 @@ static int nvme_nvm_submit_user_cmd(struct request_queue *q,
rq->timeout = timeout ? timeout : ADMIN_TIMEOUT;
 
rq->cmd_flags &= ~REQ_FAILFAST_DRIVER;
-   rq->end_io_data = 
 
if (ppa_buf && ppa_len) {
ppa_list = dma_pool_alloc(dev->dma_pool, GFP_KERNEL, _dma);
@@ -662,9 +654,7 @@ static int nvme_nvm_submit_user_cmd(struct request_queue *q,
}
 
 submit:
-   blk_execute_rq_nowait(q, NULL, rq, 0, nvme_nvm_end_user_vio);
-
-   wait_for_completion_io();
+   blk_execute_rq(q, NULL, rq, 0);
 
if (nvme_req(rq)->flags & NVME_REQ_CANCELLED)
ret = -EINTR;
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/13] block_dev: propagate bio_iov_iter_get_pages error in __blkdev_direct_IO

2017-06-03 Thread Christoph Hellwig
Once we move the block layer to its own status code we'll still want to
propagate the bio_iov_iter_get_pages, so restructure __blkdev_direct_IO
to take ret into account when returning the errno.

Signed-off-by: Christoph Hellwig <h...@lst.de>
---
 fs/block_dev.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 51959936..c1dc393ad6b9 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -334,7 +334,7 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter 
*iter, int nr_pages)
bool is_read = (iov_iter_rw(iter) == READ), is_sync;
loff_t pos = iocb->ki_pos;
blk_qc_t qc = BLK_QC_T_NONE;
-   int ret;
+   int ret = 0;
 
if ((pos | iov_iter_alignment(iter)) &
(bdev_logical_block_size(bdev) - 1))
@@ -363,7 +363,7 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter 
*iter, int nr_pages)
 
ret = bio_iov_iter_get_pages(bio, iter);
if (unlikely(ret)) {
-   bio->bi_error = ret;
+   bio->bi_error = -EIO;
bio_endio(bio);
break;
}
@@ -412,7 +412,8 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter 
*iter, int nr_pages)
}
__set_current_state(TASK_RUNNING);
 
-   ret = dio->bio.bi_error;
+   if (!ret)
+   ret = dio->bio.bi_error;
if (likely(!ret))
ret = dio->size;
 
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/13] dm: don't return errnos from ->map

2017-06-03 Thread Christoph Hellwig
Instead use the special DM_MAPIO_KILL return value to return -EIO just
like we do for the request based path.  Note that dm-log-writes returned
-ENOMEM in a few places, which now becomes -EIO instead.  No consumer
treats -ENOMEM special so this shouldn't be an issue (and it should
use a mempool to start with to make guaranteed progress).

Signed-off-by: Christoph Hellwig <h...@lst.de>
---
 drivers/md/dm-crypt.c |  4 ++--
 drivers/md/dm-flakey.c|  4 ++--
 drivers/md/dm-integrity.c | 12 ++--
 drivers/md/dm-log-writes.c|  4 ++--
 drivers/md/dm-mpath.c | 13 ++---
 drivers/md/dm-raid1.c |  6 +++---
 drivers/md/dm-snap.c  |  8 
 drivers/md/dm-target.c|  2 +-
 drivers/md/dm-verity-target.c |  6 +++---
 drivers/md/dm-zero.c  |  4 ++--
 drivers/md/dm.c   | 16 +++-
 11 files changed, 46 insertions(+), 33 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index ebf9e72d479b..f4b51809db21 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -2795,10 +2795,10 @@ static int crypt_map(struct dm_target *ti, struct bio 
*bio)
 * and is aligned to this size as defined in IO hints.
 */
if (unlikely((bio->bi_iter.bi_sector & ((cc->sector_size >> 
SECTOR_SHIFT) - 1)) != 0))
-   return -EIO;
+   return DM_MAPIO_KILL;
 
if (unlikely(bio->bi_iter.bi_size & (cc->sector_size - 1)))
-   return -EIO;
+   return DM_MAPIO_KILL;
 
io = dm_per_bio_data(bio, cc->per_bio_data_size);
crypt_io_init(io, cc, bio, dm_target_offset(ti, 
bio->bi_iter.bi_sector));
diff --git a/drivers/md/dm-flakey.c b/drivers/md/dm-flakey.c
index 13305a182611..e8f093b323ce 100644
--- a/drivers/md/dm-flakey.c
+++ b/drivers/md/dm-flakey.c
@@ -321,7 +321,7 @@ static int flakey_map(struct dm_target *ti, struct bio *bio)
if (bio_data_dir(bio) == READ) {
if (!fc->corrupt_bio_byte && !test_bit(DROP_WRITES, 
>flags) &&
!test_bit(ERROR_WRITES, >flags))
-   return -EIO;
+   return DM_MAPIO_KILL;
goto map_bio;
}
 
@@ -349,7 +349,7 @@ static int flakey_map(struct dm_target *ti, struct bio *bio)
/*
 * By default, error all I/O.
 */
-   return -EIO;
+   return DM_MAPIO_KILL;
}
 
 map_bio:
diff --git a/drivers/md/dm-integrity.c b/drivers/md/dm-integrity.c
index c7f7c8d76576..ee78fb471229 100644
--- a/drivers/md/dm-integrity.c
+++ b/drivers/md/dm-integrity.c
@@ -1352,13 +1352,13 @@ static int dm_integrity_map(struct dm_target *ti, 
struct bio *bio)
DMERR("Too big sector number: 0x%llx + 0x%x > 0x%llx",
  (unsigned long long)dio->range.logical_sector, 
bio_sectors(bio),
  (unsigned long long)ic->provided_data_sectors);
-   return -EIO;
+   return DM_MAPIO_KILL;
}
if (unlikely((dio->range.logical_sector | bio_sectors(bio)) & 
(unsigned)(ic->sectors_per_block - 1))) {
DMERR("Bio not aligned on %u sectors: 0x%llx, 0x%x",
  ic->sectors_per_block,
  (unsigned long long)dio->range.logical_sector, 
bio_sectors(bio));
-   return -EIO;
+   return DM_MAPIO_KILL;
}
 
if (ic->sectors_per_block > 1) {
@@ -1368,7 +1368,7 @@ static int dm_integrity_map(struct dm_target *ti, struct 
bio *bio)
if (unlikely((bv.bv_offset | bv.bv_len) & 
((ic->sectors_per_block << SECTOR_SHIFT) - 1))) {
DMERR("Bio vector (%u,%u) is not aligned on 
%u-sector boundary",
bv.bv_offset, bv.bv_len, 
ic->sectors_per_block);
-   return -EIO;
+   return DM_MAPIO_KILL;
}
}
}
@@ -1383,18 +1383,18 @@ static int dm_integrity_map(struct dm_target *ti, 
struct bio *bio)
wanted_tag_size *= ic->tag_size;
if (unlikely(wanted_tag_size != bip->bip_iter.bi_size)) 
{
DMERR("Invalid integrity data size %u, expected 
%u", bip->bip_iter.bi_size, wanted_tag_size);
-   return -EIO;
+   return DM_MAPIO_KILL;
}
}
} else {
if (unlikely(bip != NULL)) {
DMERR("Unexpected integrity data when using internal 
hash");
-   return -EIO;
+   return DM_M

[PATCH 05/13] fs: remove the unused error argument to dio_end_io()

2017-06-03 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Bart Van Assche <bart.vanass...@sandisk.com>
---
 fs/btrfs/inode.c   | 6 +++---
 fs/direct-io.c | 3 +--
 include/linux/fs.h | 2 +-
 3 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 17cbe9306faf..758b2666885e 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8244,7 +8244,7 @@ static void btrfs_endio_direct_read(struct bio *bio)
kfree(dip);
 
dio_bio->bi_error = bio->bi_error;
-   dio_end_io(dio_bio, bio->bi_error);
+   dio_end_io(dio_bio);
 
if (io_bio->end_io)
io_bio->end_io(io_bio, err);
@@ -8304,7 +8304,7 @@ static void btrfs_endio_direct_write(struct bio *bio)
kfree(dip);
 
dio_bio->bi_error = bio->bi_error;
-   dio_end_io(dio_bio, bio->bi_error);
+   dio_end_io(dio_bio);
bio_put(bio);
 }
 
@@ -8673,7 +8673,7 @@ static void btrfs_submit_direct(struct bio *dio_bio, 
struct inode *inode,
 * Releases and cleans up our dio_bio, no need to bio_put()
 * nor bio_endio()/bio_io_error() against dio_bio.
 */
-   dio_end_io(dio_bio, ret);
+   dio_end_io(dio_bio);
}
if (io_bio)
bio_put(io_bio);
diff --git a/fs/direct-io.c b/fs/direct-io.c
index a04ebea77de8..04247a6c3f73 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -348,13 +348,12 @@ static void dio_bio_end_io(struct bio *bio)
 /**
  * dio_end_io - handle the end io action for the given bio
  * @bio: The direct io bio thats being completed
- * @error: Error if there was one
  *
  * This is meant to be called by any filesystem that uses their own 
dio_submit_t
  * so that the DIO specific endio actions are dealt with after the filesystem
  * has done it's completion work.
  */
-void dio_end_io(struct bio *bio, int error)
+void dio_end_io(struct bio *bio)
 {
struct dio *dio = bio->bi_private;
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 803e5a9b2654..4388ab58843d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2843,7 +2843,7 @@ enum {
DIO_SKIP_DIO_COUNT = 0x08,
 };
 
-void dio_end_io(struct bio *bio, int error);
+void dio_end_io(struct bio *bio);
 
 ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
 struct block_device *bdev, struct iov_iter *iter,
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/13] scsi/osd: don't save block errors into req_results

2017-06-03 Thread Christoph Hellwig
We will only have sense data if the command executed and got a SCSI
result, so this is pointless.

Signed-off-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Martin K. Petersen <martin.peter...@oracle.com>
---
 drivers/scsi/osd/osd_initiator.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/osd/osd_initiator.c b/drivers/scsi/osd/osd_initiator.c
index 8a1b94816419..14785177ce7b 100644
--- a/drivers/scsi/osd/osd_initiator.c
+++ b/drivers/scsi/osd/osd_initiator.c
@@ -477,7 +477,7 @@ static void _set_error_resid(struct osd_request *or, struct 
request *req,
 int error)
 {
or->async_error = error;
-   or->req_errors = scsi_req(req)->result ? : error;
+   or->req_errors = scsi_req(req)->result;
or->sense_len = scsi_req(req)->sense_len;
if (or->sense_len)
memcpy(or->sense, scsi_req(req)->sense, or->sense_len);
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/13] dm: fix REQ_RAHEAD handling

2017-06-03 Thread Christoph Hellwig
A few (but not all) dm targets use a special EWOULDBLOCK error code for
failing REQ_RAHEAD requests that fail due to a lack of available resources.
But no one else knows about this magic code, and lower level drivers also
don't generate it when failing read-ahead requests for similar reasons.

So remove this special casing and ignore all additional error handling for
REQ_RAHEAD - if this was a real underlying error we'd get a normal read
once the real read comes in.

Signed-off-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Bart Van Assche <bart.vanass...@sandisk.com>
---
 drivers/md/dm-raid1.c  | 4 ++--
 drivers/md/dm-stripe.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/md/dm-raid1.c b/drivers/md/dm-raid1.c
index a95cbb80fb34..5e30b08b91d9 100644
--- a/drivers/md/dm-raid1.c
+++ b/drivers/md/dm-raid1.c
@@ -1214,7 +1214,7 @@ static int mirror_map(struct dm_target *ti, struct bio 
*bio)
 */
if (!r || (r == -EWOULDBLOCK)) {
if (bio->bi_opf & REQ_RAHEAD)
-   return -EWOULDBLOCK;
+   return -EIO;
 
queue_bio(ms, bio, rw);
return DM_MAPIO_SUBMITTED;
@@ -1258,7 +1258,7 @@ static int mirror_end_io(struct dm_target *ti, struct bio 
*bio, int error)
if (error == -EOPNOTSUPP)
return error;
 
-   if ((error == -EWOULDBLOCK) && (bio->bi_opf & REQ_RAHEAD))
+   if (bio->bi_opf & REQ_RAHEAD)
return error;
 
if (unlikely(error)) {
diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c
index 75152482f3ad..780e95889a7c 100644
--- a/drivers/md/dm-stripe.c
+++ b/drivers/md/dm-stripe.c
@@ -384,7 +384,7 @@ static int stripe_end_io(struct dm_target *ti, struct bio 
*bio, int error)
if (!error)
return 0; /* I/O complete */
 
-   if ((error == -EWOULDBLOCK) && (bio->bi_opf & REQ_RAHEAD))
+   if (bio->bi_opf & REQ_RAHEAD)
return error;
 
if (error == -EOPNOTSUPP)
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/13] gfs2: remove the unused sd_log_error field

2017-06-03 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Bart Van Assche <bart.vanass...@sandisk.com>
---
 fs/gfs2/incore.h | 1 -
 fs/gfs2/lops.c   | 4 +---
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index b7cf65d13561..aa3d44527fa2 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -815,7 +815,6 @@ struct gfs2_sbd {
atomic_t sd_log_in_flight;
struct bio *sd_log_bio;
wait_queue_head_t sd_log_flush_wait;
-   int sd_log_error;
 
atomic_t sd_reserving_log;
wait_queue_head_t sd_reserving_log_wait;
diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index b1f9144b42c7..13ebf15a4db0 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -209,10 +209,8 @@ static void gfs2_end_log_write(struct bio *bio)
struct page *page;
int i;
 
-   if (bio->bi_error) {
-   sdp->sd_log_error = bio->bi_error;
+   if (bio->bi_error)
fs_err(sdp, "Error %d writing to log\n", bio->bi_error);
-   }
 
bio_for_each_segment_all(bvec, bio, i) {
page = bvec->bv_page;
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   4   5   6   7   >