Re: [PATCH] SUNRPC: Add a check for gss_release_msg

2021-04-20 Thread J. Bruce Fields
On Tue, Apr 20, 2021 at 09:15:23AM +0200, Greg KH wrote:
> If you look at the code, this is impossible to have happen.
> 
> Please stop submitting known-invalid patches.  Your professor is playing
> around with the review process in order to achieve a paper in some
> strange and bizarre way.
> 
> This is not ok, it is wasting our time, and we will have to report this,
> AGAIN, to your university...

What's the story here?

--b.


Re: [PATCH] SUNRPC: Add a check for gss_release_msg

2021-04-07 Thread J. Bruce Fields
On Tue, Apr 06, 2021 at 07:16:56PM -0500, Aditya Pakki wrote:
> In gss_pipe_destroy_msg(), in case of error in msg, gss_release_msg
> deletes gss_msg. The patch adds a check to avoid a potential double
> free.

We're already dereferenced msg.  Nothing has set gss_msg to NULL.  It's
the gss_msg->count reference count that's supposed to prevent double
frees.

Did you see an actual bug or warning from some tool, and if so, could
you share the details?

--b.

> 
> Signed-off-by: Aditya Pakki 
> ---
>  net/sunrpc/auth_gss/auth_gss.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
> index 5f42aa5fc612..eb52eebb3923 100644
> --- a/net/sunrpc/auth_gss/auth_gss.c
> +++ b/net/sunrpc/auth_gss/auth_gss.c
> @@ -848,7 +848,8 @@ gss_pipe_destroy_msg(struct rpc_pipe_msg *msg)
>   warn_gssd();
>   gss_release_msg(gss_msg);
>   }
> - gss_release_msg(gss_msg);
> + if (gss_msg)
> + gss_release_msg(gss_msg);
>  }
>  
>  static void gss_pipe_dentry_destroy(struct dentry *dir,
> -- 
> 2.25.1


Re: Linux 5.10.25

2021-03-22 Thread J. Bruce Fields
On Sun, Mar 21, 2021 at 11:07:08PM +, Jamie Heilman wrote:
> Jamie Heilman wrote:
> > Greg Kroah-Hartman wrote:
> > > On Sat, Mar 20, 2021 at 09:31:55PM +, Jamie Heilman wrote:
> > > > [ugh, resent with the lkml headers unbroken, sorry about the dupe]
> > > > 
> > > > Greg Kroah-Hartman wrote:
> > > > > J. Bruce Fields (2):
> > > > >   Revert "nfsd4: remove check_conflicting_opens warning"
> > > > >   Revert "nfsd4: a client's own opens needn't prevent delegations"
> > > > 
> > > > Hrm, just got this when I udpated my nfs server (32bit Via EPIA system)
> > > > from 5.10.20 to 5.10.25:
> > > > 
> > > > [   49.225914] NFSD: Using UMH upcall client tracking operations.
> > > > [   49.231919] NFSD: starting 90-second grace period (net f036)
> > > > [   50.036973] [ cut here ]
> > > > [   50.041771] WARNING: CPU: 0 PID: 2284 at fs/nfsd/nfs4state.c:4968 
> > > > nfsd4_process_open2+0xf9c/0x1170 [nfsd]
> > > > [   50.051434] Modules linked in: md5 cpufreq_conservative 
> > > > cpufreq_userspace cpufreq_powersave cpufreq_ondemand autofs4 quota_v2 
> > > > quota_tree nfsd auth_rpcgss nfs lockd grace nfs_ssc fscache sunrpc 
> > > > xt_mark cls_fw sch_htb iptable_nat xt_nat nf_nat ipt_REJECT 
> > > > nf_reject_ipv4 xt_tcpudp xt_multiport iptable_mangle xt_state 
> > > > xt_conntrack nf_conntrack nf_defrag_ipv4 nf_log_ipv4 nf_log_common 
> > > > xt_LOG xt_limit iptable_filter ip_tables x_tables nhpoly1305 
> > > > chacha_generic libchacha adiantum libpoly1305 dm_crypt dm_mod 
> > > > snd_hda_codec_via snd_hda_codec_generic snd_hda_intel snd_intel_dspcfg 
> > > > snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd via_rhine 
> > > > psmouse soundcore mii via_agp sg via_velocity evdev agpgart
> > > > [   50.113386] CPU: 0 PID: 2284 Comm: nfsd Tainted: GT 
> > > > 5.10.25 #1
> > > > [   50.120669] Hardware name: To Be Filled By O.E.M. To Be Filled By 
> > > > O.E.M./To be filled by O.E.M., BIOS 080014  06/01/2009
> > > > [   50.131652] EIP: nfsd4_process_open2+0xf9c/0x1170 [nfsd]
> > > > [   50.137036] Code: 04 88 45 a4 88 07 8b 45 a0 8d 78 49 8b 45 84 8d 70 
> > > > 01 e9 2b f8 ff ff c7 45 9c 00 00 00 00 31 ff bb 00 00 27 67 e9 04 f6 ff 
> > > > ff <0f> 0b e9 a0 f5 ff ff 0f b6 d4 0f a3 15 94 3f 23 f8 0f 83 b1 fd ff
> > > > [   50.155866] EAX:  EBX: 82da25b0 ECX: 830c0920 EDX: 
> > > > [   50.162187] ESI: 82da25b0 EDI: 82da55a0 EBP: 8310be68 ESP: 8310bddc
> > > > [   50.168507] DS: 007b ES: 007b FS:  GS: 00e0 SS: 0068 EFLAGS: 
> > > > 00010246
> > > > [   50.175338] CR0: 80050033 CR2: 00551e50 CR3: 03222000 CR4: 06b0
> > > > [   50.181654] Call Trace:
> > > > [   50.184165]  ? inode_permission+0x17/0xc0
> > > > [   50.188289]  nfsd4_open+0x429/0x910 [nfsd]
> > > > [   50.192483]  ? nfsd4_encode_operation+0x185/0x1e0 [nfsd]
> > > > [   50.197900]  ? nfsd4_rename+0x1a0/0x1a0 [nfsd]
> > > > [   50.202439]  nfsd4_proc_compound+0x457/0x6c0 [nfsd]
> > > > [   50.207419]  nfsd_dispatch+0xdc/0x1a0 [nfsd]
> > > > [   50.211816]  svc_process_common+0x38a/0x650 [sunrpc]
> > > > [   50.216880]  ? svc_xprt_do_enqueue+0xd7/0xe0 [sunrpc]
> > > > [   50.222017]  ? svc_xprt_received+0x5d/0xf0 [sunrpc]
> > > > [   50.227000]  ? nfsd_svc+0x300/0x300 [nfsd]
> > > > [   50.231190]  svc_process+0xa9/0xf0 [sunrpc]
> > > > [   50.235468]  nfsd+0xcd/0x120 [nfsd]
> > > > [   50.239025]  kthread+0xe1/0x100
> > > > [   50.242259]  ? nfsd_destroy+0x50/0x50 [nfsd]
> > > > [   50.246588]  ? kthread_create_on_node+0x30/0x30
> > > > [   50.251165]  ret_from_fork+0x1c/0x28
> > > > [   50.254789] ---[ end trace 171bde4774bc9795 ]---
> > > > 
> > > > Can't readily reproduce it though, so likely a race condition or
> > > > something that requires more state buildup than I have after a few
> > > > minutes of uptime.  Kernel config at
> > > > http://audible.transient.net/~jamie/k/nfsd.config-5.10.25 in case you
> > > > think this worth more investigation.
> > > 
> > > Do you also have this issue in Linus's tree and the latest 5.11.y
> > > release?
> > 
> > Haven't tried it, but like I said, I couldn't even reproduce it again
> > with 5.10.25, or booting between 5.10.20 to 5.10.25 again ... I'll
> > give the others a shot and see if I can repro it there.
> 
> Yeah, can't repro it with 5.11.8 or 5.12.0-rc3-00281-g1d4345eb51a1
> either, seems to have been a spurious thing.

I've seen at least one report previously, so it's not a regression.

I'm not sure what's going on there.  I think it might actually be
possible if a CLOSE arrives from the client at exactly the wrong moment.
In which case, all that happens here is we decline to give out a
delegation.  Probably the warning should just be removed.

--b.



Re: [PATCH] SUNRPC: Output oversized frag reclen as ASCII if printable

2021-03-19 Thread J. Bruce Fields
On Fri, Mar 19, 2021 at 02:58:14PM +, Chuck Lever III wrote:
> Hi Chris-
> 
> > On Mar 19, 2021, at 10:54 AM, Chris Down  wrote:
> > 
> > The reclen is taken directly from the first four bytes of the message
> > with the highest bit stripped, which makes it ripe for protocol mixups.
> > For example, if someone tries to send a HTTP GET request to us, we'll
> > interpret it as a 1195725856-sized fragment (ie. (u32)'GET '), and print
> > a ratelimited KERN_NOTICE with that number verbatim.
> > 
> > This can be confusing for downstream users, who don't know what messages
> > like "fragment too large: 1195725856" actually mean, or that they
> > indicate some misconfigured infrastructure elsewhere.
> 
> One wonders whether that error message is actually useful at all.
> We could, for example, turn this into a tracepoint, or just get
> rid of it.

Just going on vague memories here, but: I think we've seen both spurious
and real bugs reported based on this.

I'm inclined to go with a dprintk or tracepoint but not removing it
entirely.

--b.

> 
> 
> > To allow users to more easily understand and debug these cases, add the
> > number interpreted as ASCII if all characters are printable:
> > 
> >RPC: fragment too large: 1195725856 (ASCII "GET ")
> > 
> > If demand grows elsewhere, a new printk format that takes a number and
> > outputs it in various formats is also a possible solution. For now, it
> > seems reasonable to put this here since this particular code path is the
> > one that has repeatedly come up in production.
> > 
> > Signed-off-by: Chris Down 
> > Cc: Chuck Lever 
> > Cc: J. Bruce Fields 
> > Cc: Trond Myklebust 
> > Cc: David S. Miller 
> > ---
> > net/sunrpc/svcsock.c | 39 +--
> > 1 file changed, 37 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> > index 2e2f007dfc9f..046b1d104340 100644
> > --- a/net/sunrpc/svcsock.c
> > +++ b/net/sunrpc/svcsock.c
> > @@ -46,6 +46,7 @@
> > #include 
> > #include 
> > #include 
> > +#include 
> > 
> > #include 
> > #include 
> > @@ -863,6 +864,34 @@ static void svc_tcp_clear_pages(struct svc_sock *svsk)
> > svsk->sk_datalen = 0;
> > }
> > 
> > +/* The reclen is taken directly from the first four bytes of the message 
> > with
> > + * the highest bit stripped, which makes it ripe for protocol mixups. For
> > + * example, if someone tries to send a HTTP GET request to us, we'll 
> > interpret
> > + * it as a 1195725856-sized fragment (ie. (u32)'GET '), and print a 
> > ratelimited
> > + * KERN_NOTICE with that number verbatim.
> > + *
> > + * To allow users to more easily understand and debug these cases, this
> > + * function decodes the purported length as ASCII, and returns it if all
> > + * characters were printable. Otherwise, we return NULL.
> > + *
> > + * WARNING: Since we reuse the u32 directly, the return value is not null
> > + * terminated, and must be printed using %.*s with
> > + * sizeof(svc_sock_reclen(svsk)).
> > + */
> > +static char *svc_sock_reclen_ascii(struct svc_sock *svsk)
> > +{
> > +   u32 len_be = cpu_to_be32(svc_sock_reclen(svsk));
> > +   char *len_be_ascii = (char *)_be;
> > +   size_t i;
> > +
> > +   for (i = 0; i < sizeof(len_be); i++) {
> > +   if (!isprint(len_be_ascii[i]))
> > +   return NULL;
> > +   }
> > +
> > +   return len_be_ascii;
> > +}
> > +
> > /*
> >  * Receive fragment record header into sk_marker.
> >  */
> > @@ -870,6 +899,7 @@ static ssize_t svc_tcp_read_marker(struct svc_sock 
> > *svsk,
> >struct svc_rqst *rqstp)
> > {
> > ssize_t want, len;
> > +   char *reclen_ascii;
> > 
> > /* If we haven't gotten the record length yet,
> >  * get the next four bytes.
> > @@ -898,9 +928,14 @@ static ssize_t svc_tcp_read_marker(struct svc_sock 
> > *svsk,
> > return svc_sock_reclen(svsk);
> > 
> > err_too_large:
> > -   net_notice_ratelimited("svc: %s %s RPC fragment too large: %d\n",
> > +   reclen_ascii = svc_sock_reclen_ascii(svsk);
> > +   net_notice_ratelimited("svc: %s %s RPC fragment too large: 
> > %d%s%.*s%s\n",
> >__func__, svsk->sk_xprt.xpt_server->sv_name,
> > -  svc_sock_reclen(svsk));
> > +  svc_sock_reclen(svsk),
> > +  reclen_ascii ? " (ASCII \"" : "",
> > +  (int)sizeof(u32),
> > +  reclen_ascii ?: "",
> > +  reclen_ascii ? "\")" : "");
> > set_bit(XPT_CLOSE, >sk_xprt.xpt_flags);
> > err_short:
> > return -EAGAIN;
> > -- 
> > 2.30.2
> > 
> 
> --
> Chuck Lever
> 
> 
> 



Re: fscache: Redesigning the on-disk cache

2021-03-08 Thread J. Bruce Fields
On Mon, Mar 08, 2021 at 09:13:55AM +, David Howells wrote:
> Amir Goldstein  wrote:
> > With ->fiemap() you can at least make the distinction between a non existing
> > and an UNWRITTEN extent.
> 
> I can't use that for XFS, Ext4 or btrfs, I suspect.  Christoph and Dave's
> assertion is that the cache can't rely on the backing filesystem's metadata
> because these can arbitrarily insert or remove blocks of zeros to bridge or
> split extents.

Could you instead make some sort of explicit contract with the
filesystem?  Maybe you'd flag it at mkfs time and query for it before
allowing a filesystem to be used for fscache.  You don't need every
filesystem to support fscache, right, just one acceptable one?

--b.


Re: [PATCH] Repair misuse of sv_lock in 5.10.16-rt30.

2021-02-26 Thread J. Bruce Fields
Adding Chuck, linux-nfs.

Makes sense to me.--b.

On Fri, Feb 26, 2021 at 09:38:20AM -0500, Joe Korty wrote:
> Repair misuse of sv_lock in 5.10.16-rt30.
> 
> [ This problem is in mainline, but only rt has the chops to be
> able to detect it. ]
> 
> Lockdep reports a circular lock dependency between serv->sv_lock and
> softirq_ctl.lock on system shutdown, when using a kernel built with
> CONFIG_PREEMPT_RT=y, and a nfs mount exists.
> 
> This is due to the definition of spin_lock_bh on rt:
> 
>   local_bh_disable();
>   rt_spin_lock(lock);
> 
> which forces a softirq_ctl.lock -> serv->sv_lock dependency.  This is
> not a problem as long as _every_ lock of serv->sv_lock is a:
> 
>   spin_lock_bh(>sv_lock);
> 
> but there is one of the form:
> 
>   spin_lock(>sv_lock);
> 
> This is what is causing the circular dependency splat.  The spin_lock()
> grabs the lock without first grabbing softirq_ctl.lock via local_bh_disable.
> If later on in the critical region,  someone does a local_bh_disable, we
> get a serv->sv_lock -> softirq_ctrl.lock dependency established.  Deadlock.
> 
> Fix is to make serv->sv_lock be locked with spin_lock_bh everywhere, no
> exceptions.
> 
> Signed-off-by: Joe Korty 
> 
> 
> 
> 
> [  OK  ] Stopped target NFS client services.
>  Stopping Logout off all iSCSI sessions on shutdown...
>  Stopping NFS server and services...
> [  109.442380] 
> [  109.442385] ==
> [  109.442386] WARNING: possible circular locking dependency detected
> [  109.442387] 5.10.16-rt30 #1 Not tainted
> [  109.442389] --
> [  109.442390] nfsd/1032 is trying to acquire lock:
> [  109.442392] 994237617f60 ((softirq_ctrl.lock).lock){+.+.}-{2:2}, at: 
> __local_bh_disable_ip+0xd9/0x270
> [  109.442405] 
> [  109.442405] but task is already holding lock:
> [  109.442406] 994245cb00b0 (>sv_lock){+.+.}-{0:0}, at: 
> svc_close_list+0x1f/0x90
> [  109.442415] 
> [  109.442415] which lock already depends on the new lock.
> [  109.442415] 
> [  109.442416] 
> [  109.442416] the existing dependency chain (in reverse order) is:
> [  109.442417] 
> [  109.442417] -> #1 (>sv_lock){+.+.}-{0:0}:
> [  109.442421]rt_spin_lock+0x2b/0xc0
> [  109.442428]svc_add_new_perm_xprt+0x42/0xa0
> [  109.442430]svc_addsock+0x135/0x220
> [  109.442434]write_ports+0x4b3/0x620
> [  109.442438]nfsctl_transaction_write+0x45/0x80
> [  109.442440]vfs_write+0xff/0x420
> [  109.442444]ksys_write+0x4f/0xc0
> [  109.442446]do_syscall_64+0x33/0x40
> [  109.442450]entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  109.442454] 
> [  109.442454] -> #0 ((softirq_ctrl.lock).lock){+.+.}-{2:2}:
> [  109.442457]__lock_acquire+0x1264/0x20b0
> [  109.442463]lock_acquire+0xc2/0x400
> [  109.442466]rt_spin_lock+0x2b/0xc0
> [  109.442469]__local_bh_disable_ip+0xd9/0x270
> [  109.442471]svc_xprt_do_enqueue+0xc0/0x4d0
> [  109.442474]svc_close_list+0x60/0x90
> [  109.442476]svc_close_net+0x49/0x1a0
> [  109.442478]svc_shutdown_net+0x12/0x40
> [  109.442480]nfsd_destroy+0xc5/0x180
> [  109.442482]nfsd+0x1bc/0x270
> [  109.442483]kthread+0x194/0x1b0
> [  109.442487]ret_from_fork+0x22/0x30
> [  109.442492] 
> [  109.442492] other info that might help us debug this:
> [  109.442492] 
> [  109.442493]  Possible unsafe locking scenario:
> [  109.442493] 
> [  109.442493]CPU0CPU1
> [  109.442494]
> [  109.442495]   lock(>sv_lock);
> [  109.442496]lock((softirq_ctrl.lock).lock);
> [  109.442498]lock(>sv_lock);
> [  109.442499]   lock((softirq_ctrl.lock).lock);
> [  109.442501] 
> [  109.442501]  *** DEADLOCK ***
> [  109.442501] 
> [  109.442501] 3 locks held by nfsd/1032:
> [  109.442503]  #0: 93b49258 (nfsd_mutex){+.+.}-{3:3}, at: 
> nfsd+0x19a/0x270
> [  109.442508]  #1: 994245cb00b0 (>sv_lock){+.+.}-{0:0}, at: 
> svc_close_list+0x1f/0x90
> [  109.442512]  #2: 93a81b20 (rcu_read_lock){}-{1:2}, at: 
> rt_spin_lock+0x5/0xc0
> [  109.442518] 
> [  109.442518] stack backtrace:
> [  109.442519] CPU: 0 PID: 1032 Comm: nfsd Not tainted 5.10.16-rt30 #1
> [  109.442522] Hardware name: Supermicro X9DRL-3F/iF/X9DRL-3F/iF, BIOS 3.2 
> 09/22/2015
> [  109.442524] Call Trace:
> [  109.442527]  dump_stack+0x77/0x97
> [  109.442533]  check_noncircular+0xdc/0xf0
> [  109.442546]  __lock_acquire+0x1264/0x20b0
> [  109.442553]  lock_acquire+0xc2/0x400
> [  109.442564]  rt_spin_lock+0x2b/0xc0
> [  109.442570]  __local_bh_disable_ip+0xd9/0x270
> [  109.442573]  svc_xprt_do_enqueue+0xc0/0x4d0
> [  109.442577]  svc_close_list+0x60/0x90
> [  109.442581]  svc_close_net+0x49/0x1a0
> [  109.442585]  svc_shutdown_net+0x12/0x40
> [  109.442588]  

Re: [PATCH v2 02/24] fs/locks: print full locks information

2021-02-24 Thread J. Bruce Fields
On Wed, Feb 24, 2021 at 03:35:44AM -0500, Luo Longjun wrote:
> @@ -2912,17 +2922,66 @@ static int locks_show(struct seq_file *f, void *v)
>   struct file_lock *fl, *bfl;
>   struct pid_namespace *proc_pidns = 
> proc_pid_ns(file_inode(f->file)->i_sb);
>  
> + struct list_head root;
> + struct list_head *tail = 
> + struct list_head *pos, *tmp;
> + struct locks_traverse_list *node, *node_child;
> +
> + int ret = 0;
> +
>   fl = hlist_entry(v, struct file_lock, fl_link);
>  
>   if (locks_translate_pid(fl, proc_pidns) == 0)
> - return 0;
> + return ret;
> +
> + INIT_LIST_HEAD();
>  
> - lock_get_status(f, fl, iter->li_pos, "");
> + node = kmalloc(sizeof(struct locks_traverse_list), GFP_KERNEL);

Is it safe to allocate here?  I thought this was under the
blocked_lock_lock spinlock.

And I still don't think you need a stack.  Have you tried the suggestion
in my previous mail?

--b.


Re: [PATCH] fs/locks: print full locks information

2021-02-21 Thread J. Bruce Fields
On Sun, Feb 21, 2021 at 01:43:03PM -0500, Jeff Layton wrote:
> On Sun, 2021-02-21 at 16:52 +, Al Viro wrote:
> > On Sat, Feb 20, 2021 at 01:32:50AM -0500, Luo Longjun wrote:
> > > + list_for_each_entry(bfl, >fl_blocked_requests, fl_blocked_member)
> > > + __locks_show(f, bfl, level + 1);
> > 
> > Er...  What's the maximal depth, again?  Kernel stack is very much finite...
> 
> Ooof, good point. I don't think there is a maximal depth on the tree
> itself. If you do want to do something like this, then you'd need to
> impose a hard limit on the recursion somehow.

I think all you need to do is something like: follow the first entry of
fl_blocked_requests, printing as you go, until you get down to lock with
empty fl_blocked_requests (a leaf of the tree).  When you get to a leaf,
print, then follow fl_blocker back up and look for your parent's next
sibling on its fl_blocked_requests list.  If there are no more siblings,
continue up to your grandparent, etc.

It's the traverse-a-maze-by-always-turning-left algorithm applied to a
tree.  I think we do it elsewhere in the VFS.

You also need an integer that keeps track of your current indent depth.
But you don't need a stack.

?

--b.


Re: [RFC][PATCH 00/25] Network fs helper library & fscache kiocb API

2021-01-22 Thread J. Bruce Fields
On Thu, Jan 21, 2021 at 08:08:24PM +, David Howells wrote:
> J. Bruce Fields  wrote:
> > So, I'm still confused: there must be some case where we know fscache
> > actually works reliably and doesn't corrupt your data, right?
> 
> Using ext2/3, for example.  I don't know under what circumstances xfs, ext4
> and btrfs might insert/remove blocks of zeros, but I'm told it can happen.

Do ext2/3 work well for fscache in other ways?

--b.


Re: [RFC][PATCH 00/25] Network fs helper library & fscache kiocb API

2021-01-21 Thread J. Bruce Fields
On Thu, Jan 21, 2021 at 06:55:13PM +, David Howells wrote:
> J. Bruce Fields  wrote:
> 
> > > Fixing this requires a much bigger overhaul of cachefiles than this 
> > > patchset
> > > performs.
> > 
> > That sounds like "sometimes you may get file corruption and there's
> > nothing you can do about it".  But I know people actually use fscache,
> > so it must be reliable at least for some use cases.
> 
> Yes.  That's true for the upstream code because that uses bmap.

Sorry, when you say "that's true", what part are you referring to?

> I'm switching
> to use SEEK_HOLE/SEEK_DATA to get rid of the bmap usage, but it doesn't change
> the issue.
> 
> > Is it that those "bridging" blocks only show up in certain corner cases
> > that users can arrange to avoid?  Or that it's OK as long as you use
> > certain specific file systems whose behavior goes beyond what's
> > technically required by the bamp or seek interfaces?
> 
> That's a question for the xfs, ext4 and btrfs maintainers, and may vary
> between kernel versions and fsck or filesystem packing utility versions.

So, I'm still confused: there must be some case where we know fscache
actually works reliably and doesn't corrupt your data, right?

--b.


Re: [RFC][PATCH 00/25] Network fs helper library & fscache kiocb API

2021-01-21 Thread J. Bruce Fields
On Thu, Jan 21, 2021 at 05:02:57PM +, David Howells wrote:
> J. Bruce Fields  wrote:
> 
> > On Wed, Jan 20, 2021 at 10:21:24PM +, David Howells wrote:
> > >  Note that this uses SEEK_HOLE/SEEK_DATA to locate the data available
> > >  to be read from the cache.  Whilst this is an improvement from the
> > >  bmap interface, it still has a problem with regard to a modern
> > >  extent-based filesystem inserting or removing bridging blocks of
> > >  zeros.
> > 
> > What are the consequences from the point of view of a user?
> 
> The cache can get both false positive and false negative results on checks for
> the presence of data because an extent-based filesystem can, at will, insert
> or remove blocks of contiguous zeros to make the extents easier to encode
> (ie. bridge them or split them).
> 
> A false-positive means that you get a block of zeros in the middle of your
> file that very probably shouldn't be there (ie. file corruption); a
> false-negative means that we go and reload the missing chunk from the server.
> 
> The problem exists in cachefiles whether we use bmap or we use
> SEEK_HOLE/SEEK_DATA.  The only way round it is to keep track of what data is
> present independently of backing filesystem's metadata.
> 
> To this end, it shouldn't (mis)behave differently than the code already there
> - except that it handles better the case in which the backing filesystem
> blocksize != PAGE_SIZE (which may not be relevant on an extent-based
> filesystem anyway if it packs parts of different files together in a single
> block) because the current implementation only bmaps the first block in a page
> and doesn't probe for the rest.
> 
> Fixing this requires a much bigger overhaul of cachefiles than this patchset
> performs.

That sounds like "sometimes you may get file corruption and there's
nothing you can do about it".  But I know people actually use fscache,
so it must be reliable at least for some use cases.

Is it that those "bridging" blocks only show up in certain corner cases
that users can arrange to avoid?  Or that it's OK as long as you use
certain specific file systems whose behavior goes beyond what's
technically required by the bamp or seek interfaces?

--b.

> 
> Also, it works towards getting rid of this use of bmap, but that's not user
> visible.
> 
> David


Re: [RFC][PATCH 00/25] Network fs helper library & fscache kiocb API

2021-01-21 Thread J. Bruce Fields
On Wed, Jan 20, 2021 at 10:21:24PM +, David Howells wrote:
>  Note that this uses SEEK_HOLE/SEEK_DATA to locate the data available
>  to be read from the cache.  Whilst this is an improvement from the
>  bmap interface, it still has a problem with regard to a modern
>  extent-based filesystem inserting or removing bridging blocks of
>  zeros.

What are the consequences from the point of view of a user?

--b.

> 
> This is a step towards overhauling the fscache API.  The change is opt-in
> on the part of the network filesystem.  A netfs should not try to mix the
> old and the new API because of conflicting ways of handling pages and the
> PG_fscache page flag and because it would be mixing DIO with buffered I/O.
> Further, the helper library can't be used with the old API.
> 
> This does not change any of the fscache cookie handling APIs or the way
> invalidation is done.
> 
> In the near term, I intend to deprecate and remove the old I/O API
> (fscache_allocate_page{,s}(), fscache_read_or_alloc_page{,s}(),
> fscache_write_page() and fscache_uncache_page()) and eventually replace
> most of fscache/cachefiles with something simpler and easier to follow.
> 
> The patchset contains four parts:
> 
>  (1) Some helper patches, including provision of an ITER_XARRAY iov
>  iterator and a function to do readahead expansion.
> 
>  (2) Patches to add the netfs helper library.
> 
>  (3) A patch to add the fscache/cachefiles kiocb API
> 
>  (4) Patches to add support in AFS for this.
> 
> With this, AFS without a cache passes all expected xfstests; with a cache,
> there's an extra failure, but that's also there before these patches.
> Fixing that probably requires a greater overhaul.
> 
> These patches can be found also on:
> 
>   
> https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-netfs-lib
> 
> David
> ---
> David Howells (24):
>   iov_iter: Add ITER_XARRAY
>   vm: Add wait/unlock functions for PG_fscache
>   mm: Implement readahead_control pageset expansion
>   vfs: Export rw_verify_area() for use by cachefiles
>   netfs: Make a netfs helper module
>   netfs: Provide readahead and readpage netfs helpers
>   netfs: Add tracepoints
>   netfs: Gather stats
>   netfs: Add write_begin helper
>   netfs: Define an interface to talk to a cache
>   fscache, cachefiles: Add alternate API to use kiocb for read/write to 
> cache
>   afs: Disable use of the fscache I/O routines
>   afs: Pass page into dirty region helpers to provide THP size
>   afs: Print the operation debug_id when logging an unexpected data 
> version
>   afs: Move key to afs_read struct
>   afs: Don't truncate iter during data fetch
>   afs: Log remote unmarshalling errors
>   afs: Set up the iov_iter before calling afs_extract_data()
>   afs: Use ITER_XARRAY for writing
>   afs: Wait on PG_fscache before modifying/releasing a page
>   afs: Extract writeback extension into its own function
>   afs: Prepare for use of THPs
>   afs: Use the fs operation ops to handle FetchData completion
>   afs: Use new fscache read helper API
> 
> Takashi Iwai (1):
>   cachefiles: Drop superfluous readpages aops NULL check
> 
> 
>  fs/Kconfig|1 +
>  fs/Makefile   |1 +
>  fs/afs/Kconfig|1 +
>  fs/afs/dir.c  |  225 ---
>  fs/afs/file.c |  472 --
>  fs/afs/fs_operation.c |4 +-
>  fs/afs/fsclient.c |  108 ++--
>  fs/afs/inode.c|7 +-
>  fs/afs/internal.h |   57 +-
>  fs/afs/rxrpc.c|  150 ++---
>  fs/afs/write.c|  610 ++
>  fs/afs/yfsclient.c|   82 +--
>  fs/cachefiles/Makefile|1 +
>  fs/cachefiles/interface.c |5 +-
>  fs/cachefiles/internal.h  |9 +
>  fs/cachefiles/rdwr.c  |2 -
>  fs/cachefiles/rdwr2.c |  406 
>  fs/fscache/Makefile   |3 +-
>  fs/fscache/internal.h |3 +
>  fs/fscache/page.c |2 +-
>  fs/fscache/page2.c|  116 
>  fs/fscache/stats.c|1 +
>  fs/internal.h |5 -
>  fs/netfs/Kconfig  |   23 +
>  fs/netfs/Makefile |5 +
>  fs/netfs/internal.h   |   97 +++
>  fs/netfs/read_helper.c| 1142 +
>  fs/netfs/stats.c  |   57 ++
>  fs/read_write.c   |1 +
>  include/linux/fs.h|1 +
>  include/linux/fscache-cache.h |4 +
>  include/linux/fscache.h   |   28 +-
>  include/linux/netfs.h |  167 +
>  include/linux/pagemap.h   |   16 +
>  include/net/af_rxrpc.h|2 +-
>  include/trace/events/afs.h|   74 +--
>  include/trace/events/netfs.h  |  201 ++
>  mm/filemap.c  |   18 +
>  

Re: [PATCH] nfsd: Fix error return code in nfsd_file_cache_init()

2020-11-25 Thread J. Bruce Fields
On Wed, Nov 25, 2020 at 03:39:33AM -0500, Huang Guobin wrote:
> Fix to return PTR_ERR() error code from the error handling case instead of
> 0 in function nfsd_file_cache_init(), as done elsewhere in this function.
> 
> Fixes: 65294c1f2c5e7("nfsd: add a new struct file caching facility to nfsd")
> Signed-off-by: Huang Guobin 
> ---
>  fs/nfsd/filecache.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index c8b9d2667ee6..a8a5b555f08b 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -686,6 +686,7 @@ nfsd_file_cache_init(void)
>   pr_err("nfsd: unable to create fsnotify group: %ld\n",
>   PTR_ERR(nfsd_file_fsnotify_group));
>   nfsd_file_fsnotify_group = NULL;
> + ret = PTR_ERR(nfsd_file_fsnotify_group);

I think you meant to add that one line earlier.

Otherwise fine, but it looks like an unlikely case so can probably wait
for the merge window.

--b.

>   goto out_notifier;
>   }
>  
> -- 
> 2.22.0


[GIT PULL] nfsd bugfix for 5.10

2020-11-18 Thread J. Bruce Fields
Please pull

  git://linux-nfs.org/~bfields/linux.git tags/nfsd-5.10-2

Just one quick fix for a tracing oops.

--b.


Scott Mayhew (1):
  SUNRPC: Fix oops in the rpc_xdr_buf event class

 include/trace/events/sunrpc.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


Re: [RFC][PATCH 00/18] crypto: Add generic Kerberos library

2020-11-12 Thread J. Bruce Fields
On Thu, Nov 12, 2020 at 12:57:45PM +, David Howells wrote:
> 
> Hi Herbert, Bruce,
> 
> Here's my first cut at a generic Kerberos crypto library in the kernel so
> that I can share code between rxrpc and sunrpc (and cifs?).
> 
> I derived some of the parts from the sunrpc gss library and added more
> advanced AES and Camellia crypto.  I haven't ported across the DES-based
> crypto yet - I figure that can wait a bit till the interface is sorted.
> 
> Whilst I have put it into a directory under crypto/, I haven't made an
> interface that goes and loads it (analogous to crypto_alloc_skcipher,
> say).  Instead, you call:
> 
> const struct krb5_enctype *crypto_krb5_find_enctype(u32 enctype);
> 
> to go and get a handler table and then use a bunch of accessor functions to
> jump through the hoops.  This is basically the way the sunrpc gsslib does
> things.  It might be worth making it so you do something like:
> 
>   struct crypto_mech *ctx = crypto_mech_alloc("krb5(18)");
> 
> to get enctype 18, but I'm not sure if it's worth the effort.  Also, I'm
> not sure if there are any alternatives to kerberos we will need to support.

We did have code for a non-krb5 mechanism at some point, but it was torn
out.  So I don't think that's a priority.

(Chuck, will RPC-over-SSL need a new non-krb5 mechanism?)

> There are three main interfaces to it:
> 
>  (*) I/O crypto: encrypt, decrypt, get_mic and verify_mic.
> 
>  These all do in-place crypto, using an sglist to define the buffer
>  with the data in it.  Is it necessary to make it able to take separate
>  input and output buffers?

I don't know.  My memory is that the buffer management in the existing
rpcsec_gss code is complex and fragile.  See e.g. the long comment in
gss_krb5_remove_padding.

--b.

>  (*) PRF+ calculation for key derivation.
>  (*) Kc, Ke, Ki derivation.
> 
>  These use krb5_buffer structs to pass objects around.  This is akin to
>  the xdr_netobj, but has a void* instead of a u8* data pointer.
> 
> In terms of rxrpc's rxgk, there's another step in key derivation that isn't
> part of the kerberos standard, but uses the PRF+ function to generate a key
> that is then used to generate Kc, Ke and Ki.  Is it worth putting this into
> the directory or maybe having a callout to insert an intermediate step in
> key derivation?
> 
> Note that, for purposes of illustration, I've included some rxrpc patches
> that use this interface to implement the rxgk Rx security class.  The
> branch also is based on some rxrpc patches that are a prerequisite for
> this, but the crypto patches don't need it.
> 
> ---
> The patches can be found here also:
> 
>   
> http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=crypto-krb5
> 
> David
> ---
> David Howells (18):
>   crypto/krb5: Implement Kerberos crypto core
>   crypto/krb5: Add some constants out of sunrpc headers
>   crypto/krb5: Provide infrastructure and key derivation
>   crypto/krb5: Implement the Kerberos5 rfc3961 key derivation
>   crypto/krb5: Implement the Kerberos5 rfc3961 encrypt and decrypt 
> functions
>   crypto/krb5: Implement the Kerberos5 rfc3961 get_mic and verify_mic
>   crypto/krb5: Implement the AES enctypes from rfc3962
>   crypto/krb5: Implement crypto self-testing
>   crypto/krb5: Implement the AES enctypes from rfc8009
>   crypto/krb5: Implement the AES encrypt/decrypt from rfc8009
>   crypto/krb5: Add the AES self-testing data from rfc8009
>   crypto/krb5: Implement the Camellia enctypes from rfc6803
>   rxrpc: Add the security index for yfs-rxgk
>   rxrpc: Add YFS RxGK (GSSAPI) security class
>   rxrpc: rxgk: Provide infrastructure and key derivation
>   rxrpc: rxgk: Implement the yfs-rxgk security class (GSSAPI)
>   rxrpc: rxgk: Implement connection rekeying
>   rxgk: Support OpenAFS's rxgk implementation
> 
> 
>  crypto/krb5/Kconfig  |9 +
>  crypto/krb5/Makefile |   11 +-
>  crypto/krb5/internal.h   |  101 +++
>  crypto/krb5/kdf.c|  223 ++
>  crypto/krb5/main.c   |  190 +
>  crypto/krb5/rfc3961_simplified.c |  732 ++
>  crypto/krb5/rfc3962_aes.c|  140 
>  crypto/krb5/rfc6803_camellia.c   |  249 ++
>  crypto/krb5/rfc8009_aes2.c   |  440 +++
>  crypto/krb5/selftest.c   |  543 +
>  crypto/krb5/selftest_data.c  |  289 +++
>  fs/afs/misc.c|   13 +
>  include/crypto/krb5.h|  100 +++
>  include/keys/rxrpc-type.h|   17 +
>  include/trace/events/rxrpc.h |4 +
>  include/uapi/linux/rxrpc.h   |   17 +
>  net/rxrpc/Kconfig|   10 +
>  net/rxrpc/Makefile   |5 +
>  net/rxrpc/ar-internal.h  |   20 +
>  net/rxrpc/conn_object.c  |2 +
>  net/rxrpc/key.c  |  319 
>  net/rxrpc/rxgk.c | 1232 

[GIT PULL] nfsd 5.10 fixes

2020-11-09 Thread J. Bruce Fields
Please pull nfsd fixes for 5.10 from

  git://linux-nfs.org/~bfields/linux.git tags/nfsd-5.10-1

This is mainly server-to-server copy and fallout from Chuck's 5.10 rpc
refactoring.

--b.

Chuck Lever (3):
  NFSD: NFSv3 PATHCONF Reply is improperly formed
  SUNRPC: Fix general protection fault in trace_rpc_xdr_overflow()
  NFSD: MKNOD should return NFSERR_BADTYPE instead of NFSERR_INVAL

Dai Ngo (2):
  NFSD: Fix use-after-free warning when doing inter-server copy
  NFSD: fix missing refcount in nfsd4_copy by nfsd4_do_async_copy

Dan Carpenter (2):
  net/sunrpc: return 0 on attempt to write to "transports"
  net/sunrpc: fix useless comparison in proc_do_xprt()

 fs/nfsd/nfs3proc.c| 6 +-
 fs/nfsd/nfs3xdr.c | 1 +
 fs/nfsd/nfs4proc.c| 3 ++-
 include/trace/events/sunrpc.h | 8 
 net/sunrpc/sysctl.c   | 9 +
 5 files changed, 13 insertions(+), 14 deletions(-)


Re: linux-next: Signed-off-by missing for commit in the nfsd tree

2020-11-08 Thread J. Bruce Fields
On Mon, Nov 09, 2020 at 08:20:32AM +1100, Stephen Rothwell wrote:
> Hi all,
> 
> Commit
> 
>   bfb5aa1685d5 ("net/sunrpc: fix useless comparison in proc_do_xprt()")
> 
> is missing a Signed-off-by from its author.

I split the original patch in 2 and fixed a bug in this second patch,
but unless I hear otherwise I'll assume Dan's OK with his Signed-off-by
staying on both

--b.


Re: [PATCH] net/sunrpc: Fix return value from proc_do_xprt()

2020-11-07 Thread J. Bruce Fields
On Sat, Nov 07, 2020 at 01:49:40PM +, Alex Dewar wrote:
> On Fri, Nov 06, 2020 at 05:07:21PM -0500, J. Bruce Fields wrote:
> > Whoops, got 3 independent patches for this and overlooked this one.  See
> > https://lore.kernel.org/linux-nfs/20201106205959.gb26...@fieldses.org/T/#t
> > 
> > --b.
> 
> That looks like a cleaner fix. Thanks for looking anyhow and sorry for
> the noise!

Not noise, all these efforts are appreciated.---b.

> 
> > 
> > On Sat, Oct 24, 2020 at 03:52:40PM +0100, Alex Dewar wrote:
> > > Commit c09f56b8f68d ("net/sunrpc: Fix return value for sysctl
> > > sunrpc.transports") attempted to add error checking for the call to
> > > memory_read_from_buffer(), however its return value was assigned to a
> > > size_t variable, so any negative values would be lost in the cast. Fix
> > > this.
> > > 
> > > Addresses-Coverity-ID: 1498033: Control flow issues (NO_EFFECT)
> > > Fixes: c09f56b8f68d ("net/sunrpc: Fix return value for sysctl 
> > > sunrpc.transports")
> > > Signed-off-by: Alex Dewar 
> > > ---
> > >  net/sunrpc/sysctl.c | 7 +--
> > >  1 file changed, 5 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/net/sunrpc/sysctl.c b/net/sunrpc/sysctl.c
> > > index a18b36b5422d..c95a2b84dd95 100644
> > > --- a/net/sunrpc/sysctl.c
> > > +++ b/net/sunrpc/sysctl.c
> > > @@ -62,6 +62,7 @@ rpc_unregister_sysctl(void)
> > >  static int proc_do_xprt(struct ctl_table *table, int write,
> > >   void *buffer, size_t *lenp, loff_t *ppos)
> > >  {
> > > + ssize_t bytes_read;
> > >   char tmpbuf[256];
> > >   size_t len;
> > >  
> > > @@ -70,12 +71,14 @@ static int proc_do_xprt(struct ctl_table *table, int 
> > > write,
> > >   return 0;
> > >   }
> > >   len = svc_print_xprts(tmpbuf, sizeof(tmpbuf));
> > > - *lenp = memory_read_from_buffer(buffer, *lenp, ppos, tmpbuf, len);
> > > + bytes_read = memory_read_from_buffer(buffer, *lenp, ppos, tmpbuf, len);
> > >  
> > > - if (*lenp < 0) {
> > > + if (bytes_read < 0) {
> > >   *lenp = 0;
> > >   return -EINVAL;
> > >   }
> > > +
> > > + *lenp = bytes_read;
> > >   return 0;
> > >  }
> > >  
> > > -- 
> > > 2.29.1


Re: [PATCH] net/sunrpc: Fix return value from proc_do_xprt()

2020-11-06 Thread J. Bruce Fields
Whoops, got 3 independent patches for this and overlooked this one.  See
https://lore.kernel.org/linux-nfs/20201106205959.gb26...@fieldses.org/T/#t

--b.

On Sat, Oct 24, 2020 at 03:52:40PM +0100, Alex Dewar wrote:
> Commit c09f56b8f68d ("net/sunrpc: Fix return value for sysctl
> sunrpc.transports") attempted to add error checking for the call to
> memory_read_from_buffer(), however its return value was assigned to a
> size_t variable, so any negative values would be lost in the cast. Fix
> this.
> 
> Addresses-Coverity-ID: 1498033: Control flow issues (NO_EFFECT)
> Fixes: c09f56b8f68d ("net/sunrpc: Fix return value for sysctl 
> sunrpc.transports")
> Signed-off-by: Alex Dewar 
> ---
>  net/sunrpc/sysctl.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/net/sunrpc/sysctl.c b/net/sunrpc/sysctl.c
> index a18b36b5422d..c95a2b84dd95 100644
> --- a/net/sunrpc/sysctl.c
> +++ b/net/sunrpc/sysctl.c
> @@ -62,6 +62,7 @@ rpc_unregister_sysctl(void)
>  static int proc_do_xprt(struct ctl_table *table, int write,
>   void *buffer, size_t *lenp, loff_t *ppos)
>  {
> + ssize_t bytes_read;
>   char tmpbuf[256];
>   size_t len;
>  
> @@ -70,12 +71,14 @@ static int proc_do_xprt(struct ctl_table *table, int 
> write,
>   return 0;
>   }
>   len = svc_print_xprts(tmpbuf, sizeof(tmpbuf));
> - *lenp = memory_read_from_buffer(buffer, *lenp, ppos, tmpbuf, len);
> + bytes_read = memory_read_from_buffer(buffer, *lenp, ppos, tmpbuf, len);
>  
> - if (*lenp < 0) {
> + if (bytes_read < 0) {
>   *lenp = 0;
>   return -EINVAL;
>   }
> +
> + *lenp = bytes_read;
>   return 0;
>  }
>  
> -- 
> 2.29.1


Re: [PATCH] sunrpc: raise kernel RPC channel buffer size

2020-10-23 Thread J. Bruce Fields
On Sat, Oct 24, 2020 at 03:29:25AM +0200, Roberto Bergantinos Corpas wrote:
> Good point Geert !
> 
> > How about making it a kvmalloc?
> 
> I can post a new patch using kvmalloc, Bruce looks we can also
> prescind of queue_io_mutex, what do you think ?

And revert da77005f0d64, I think. 

Maybe there's something I'm missing, but I don't think we need all that
complexity.

--b.


Re: [PATCH] sunrpc: raise kernel RPC channel buffer size

2020-10-23 Thread J. Bruce Fields
On Fri, Oct 23, 2020 at 11:44:38AM +0200, Geert Uytterhoeven wrote:
>   Hi Bruce, Roberto,
> 
> On Mon, 19 Oct 2020, J. Bruce Fields wrote:
> >On Mon, Oct 19, 2020 at 11:33:56AM +0200, Roberto Bergantinos Corpas wrote:
> >>Its possible that using AUTH_SYS and mountd manage-gids option a
> >>user may hit the 8k RPC channel buffer limit. This have been observed
> >>on field, causing unanswered RPCs on clients after mountd fails to
> >>write on channel :
> >>
> >>rpc.mountd[11231]: auth_unix_gid: error writing reply
> >>
> >>Userland nfs-utils uses a buffer size of 32k (RPC_CHAN_BUF_SIZE), so
> >>lets match those two.
> >
> >Thanks, applying.
> >
> >That should allow about 4000 group memberships.  If that doesn't do it
> >then maybe it's time to rethink
> >
> >--b.
> >
> >>
> >>Signed-off-by: Roberto Bergantinos Corpas 
> >>---
> >> net/sunrpc/cache.c | 2 +-
> >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >>diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
> >>index baef5ee43dbb..08df4c599ab3 100644
> >>--- a/net/sunrpc/cache.c
> >>+++ b/net/sunrpc/cache.c
> >>@@ -908,7 +908,7 @@ static ssize_t cache_do_downcall(char *kaddr, const 
> >>char __user *buf,
> >> static ssize_t cache_slow_downcall(const char __user *buf,
> >>   size_t count, struct cache_detail *cd)
> >> {
> >>-   static char write_buf[8192]; /* protected by queue_io_mutex */
> >>+   static char write_buf[32768]; /* protected by queue_io_mutex */
> >>ssize_t ret = -EINVAL;
> >>
> >>if (count >= sizeof(write_buf))
> 
> This is now commit 27a1e8a0f79e643d ("sunrpc: raise kernel RPC channel
> buffer size") upstream, and increases kernel size by 24 KiB, even if
> RPC is not used.
> 
> Can this buffer allocated dynamically instead? This code path seems to
> be a slow path anyway. If it's critical, perhaps this buffer can be
> allocated on first use?

Sure.  Actually it is using an allocated buffer typically, see
cache_downcall().

Looking back at the history  That was added by Trond in 2009
(da77005f0d64 "SUNRPC: Remove the global temporary write buffer in
net/sunrpc/cache.c").  

Before that there's a pre-git change from 2004 which replaced a simple
kmalloc(PAGE_SIZE).

So I guess the point was to be careful about higher-order allocations,
but probably it was overkill.

How about making it a kvmalloc?

--b.


[GIT PULL] nfsd change for 5.10

2020-10-21 Thread J. Bruce Fields
Please pull nfsd changes for 5.10 from:

  git://linux-nfs.org/~bfields/linux.git tags/nfsd-5.10

The one new feature this time, from Anna Schumaker, is READ_PLUS, which
has the same arguments as READ but allows the server to return an array
of data and hole extents.

Otherwise it's a lot of cleanup and bugfixes.

--b.


Alex Dewar (2):
  nfsd: Fix typo in comment
  nfsd: Remove unnecessary assignment in nfs4xdr.c

Anna Schumaker (5):
  SUNRPC/NFSD: Implement xdr_reserve_space_vec()
  NFSD: Add READ_PLUS data support
  NFSD: Add READ_PLUS hole segment encoding
  NFSD: Return both a hole and a data segment
  NFSD: Encode a full READ_PLUS reply

Artur Molchanov (1):
  net/sunrpc: Fix return value for sysctl sunrpc.transports

Chuck Lever (17):
  NFSD: Correct type annotations in user xattr helpers
  NFSD: Correct type annotations in user xattr XDR functions
  NFSD: Correct type annotations in COPY XDR functions
  NFSD: Add missing NFSv2 .pc_func methods
  lockd: Replace PROC() macro with open code
  NFSACL: Replace PROC() macro with open code
  NFSD: Encoder and decoder functions are always present
  NFSD: Clean up switch statement in nfsd_dispatch()
  NFSD: Clean up stale comments in nfsd_dispatch()
  NFSD: Clean up nfsd_dispatch() variables
  NFSD: Refactor nfsd_dispatch() error paths
  NFSD: Remove vestigial typedefs
  NFSD: Fix .pc_release method for NFSv2
  NFSD: Call NFSv2 encoders on error returns
  NFSD: Remove the RETURN_STATUS() macro
  NFSD: Map nfserr_wrongsec outside of nfsd_dispatch
  NFSD: Hoist status code encoding into XDR encoder functions

Dai Ngo (1):
  NFSv4.2: Fix NFS4ERR_STALE error when doing inter server copy

Dan Aloni (1):
  svcrdma: fix bounce buffers for unaligned offsets and multiple pages

Hou Tao (1):
  nfsd: rename delegation related tracepoints to make them less confusing

J. Bruce Fields (8):
  nfsd: remove fault injection code
  nfsd: give up callbacks on revoked delegations
  MAINTAINERS: Note NFS docs under Documentation/
  Documentation: update RPCSEC_GSSv3 RFC link
  sunrpc: simplify do_cache_clean
  nfsd: Cache R, RW, and W opens separately
  nfsd4: remove check_conflicting_opens warning
  nfsd: rq_lease_breaker cleanup

Martijn de Gouw (1):
  SUNRPC: fix copying of multiple pages in gss_read_proxy_verf()

Randy Dunlap (1):
  net: sunrpc: delete repeated words

Rik van Riel (1):
  silence nfscache allocation warnings with kvzalloc

Roberto Bergantinos Corpas (1):
  sunrpc: raise kernel RPC channel buffer size

Tom Rix (1):
  nfsd: remove unneeded break

Xu Wang (1):
  sunrpc: cache : Replace seq_printf with seq_puts

Zheng Bin (1):
  nfsd: fix comparison to bool warning

 Documentation/admin-guide/nfs/fault_injection.rst |  70 ---
 Documentation/admin-guide/nfs/index.rst   |   1 -
 Documentation/filesystems/nfs/rpc-server-gss.rst  |   5 +-
 MAINTAINERS   |   2 +
 fs/lockd/svc4proc.c   | 248 +++--
 fs/lockd/svcproc.c| 250 +++--
 fs/nfs/nfs4file.c |  38 +-
 fs/nfs/nfs4super.c|   5 +
 fs/nfs/super.c|  17 +
 fs/nfs_common/Makefile|   1 +
 fs/nfs_common/nfs_ssc.c   |  94 
 fs/nfsd/Kconfig   |  12 +-
 fs/nfsd/Makefile  |   1 -
 fs/nfsd/export.c  |   2 +-
 fs/nfsd/filecache.c   |   2 +-
 fs/nfsd/nfs2acl.c | 160 --
 fs/nfsd/nfs3acl.c |  88 ++--
 fs/nfsd/nfs3proc.c| 238 +
 fs/nfsd/nfs3xdr.c |  25 +-
 fs/nfsd/nfs4proc.c|  34 +-
 fs/nfsd/nfs4state.c   | 605 +-
 fs/nfsd/nfs4xdr.c | 202 ++--
 fs/nfsd/nfscache.c|  12 +-
 fs/nfsd/nfsctl.c  |   3 -
 fs/nfsd/nfsproc.c | 283 +-
 fs/nfsd/nfssvc.c  | 122 +++--
 fs/nfsd/nfsxdr.c  |  52 +-
 fs/nfsd/state.h   |  27 -
 fs/nfsd/trace.h   |   4 +-
 fs/nfsd/vfs.c |   6 +-
 fs/nfsd/xdr.h |  16 +-
 fs/nfsd/xdr3.h|   1 +
 fs/nfsd/xdr4.h|   1 +
 include/linux/nfs_ssc.h   |  67

Re: [PATCH] SUNRPC: fix copying of multiple pages in gss_read_proxy_verf()

2020-10-19 Thread J. Bruce Fields
On Mon, Oct 19, 2020 at 03:46:39PM +, Martijn de Gouw wrote:
> Hi
> 
> On 19-10-2020 17:23, J. Bruce Fields wrote:
> > On Mon, Oct 19, 2020 at 01:42:27PM +0200, Martijn de Gouw wrote:
> >> When the passed token is longer than 4032 bytes, the remaining part
> >> of the token must be copied from the rqstp->rq_arg.pages. But the
> >> copy must make sure it happens in a consecutive way.
> > 
> > Thanks.  Apologies, but I don't immediately see where the copy is
> > non-consecutive.  What exactly is the bug in the existing code?
> 
> In the first memcpy 'length' bytes are copied from argv->iobase, but 
> since the header is in front, this never fills the whole first page of 
> in_token->pages.
> 
> The memcpy in the loop copies the following bytes, but starts writing at 
> the next page of in_token->pages. This leaves the last bytes of page 0 
> unwritten.
> 
> Next to that, the remaining data is in page 0 of rqstp->rq_arg.pages, 
> not page 1.

Got it, thanks.  Looks like the culprit might be a patch from a year ago
from Chuck, 5866efa8cbfb "SUNRPC: Fix svcauth_gss_proxy_init()"?  At
least, that's the last major patch to touch this code.

--b.

> 
> Regards, Martijn
> 
> > 
> > --b.
> > 
> >>
> >> Signed-off-by: Martijn de Gouw 
> >> ---
> >>   net/sunrpc/auth_gss/svcauth_gss.c | 27 +--
> >>   1 file changed, 17 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/net/sunrpc/auth_gss/svcauth_gss.c 
> >> b/net/sunrpc/auth_gss/svcauth_gss.c
> >> index 258b04372f85..bd4678db9d76 100644
> >> --- a/net/sunrpc/auth_gss/svcauth_gss.c
> >> +++ b/net/sunrpc/auth_gss/svcauth_gss.c
> >> @@ -1147,9 +1147,9 @@ static int gss_read_proxy_verf(struct svc_rqst 
> >> *rqstp,
> >>   struct gssp_in_token *in_token)
> >>   {
> >>struct kvec *argv = >rq_arg.head[0];
> >> -  unsigned int page_base, length;
> >> -  int pages, i, res;
> >> -  size_t inlen;
> >> +  unsigned int length, pgto_offs, pgfrom_offs;
> >> +  int pages, i, res, pgto, pgfrom;
> >> +  size_t inlen, to_offs, from_offs;
> >>   
> >>res = gss_read_common_verf(gc, argv, authp, in_handle);
> >>if (res)
> >> @@ -1177,17 +1177,24 @@ static int gss_read_proxy_verf(struct svc_rqst 
> >> *rqstp,
> >>memcpy(page_address(in_token->pages[0]), argv->iov_base, length);
> >>inlen -= length;
> >>   
> >> -  i = 1;
> >> -  page_base = rqstp->rq_arg.page_base;
> >> +  to_offs = length;
> >> +  from_offs = rqstp->rq_arg.page_base;
> >>while (inlen) {
> >> -  length = min_t(unsigned int, inlen, PAGE_SIZE);
> >> -  memcpy(page_address(in_token->pages[i]),
> >> - page_address(rqstp->rq_arg.pages[i]) + page_base,
> >> +  pgto = to_offs >> PAGE_SHIFT;
> >> +  pgfrom = from_offs >> PAGE_SHIFT;
> >> +  pgto_offs = to_offs & ~PAGE_MASK;
> >> +  pgfrom_offs = from_offs & ~PAGE_MASK;
> >> +
> >> +  length = min_t(unsigned int, inlen,
> >> +   min_t(unsigned int, PAGE_SIZE - pgto_offs,
> >> + PAGE_SIZE - pgfrom_offs));
> >> +  memcpy(page_address(in_token->pages[pgto]) + pgto_offs,
> >> + page_address(rqstp->rq_arg.pages[pgfrom]) + pgfrom_offs,
> >>   length);
> >>   
> >> +  to_offs += length;
> >> +  from_offs += length;
> >>inlen -= length;
> >> -  page_base = 0;
> >> -  i++;
> >>}
> >>return 0;
> >>   }
> >> -- 
> >> 2.20.1
> 
> -- 
> Martijn de Gouw
> Designer
> Prodrive Technologies
> Mobile: +31 63 17 76 161
> Phone:  +31 40 26 76 200


Re: gssapi, crypto and afs/rxrpc

2020-10-19 Thread J. Bruce Fields
On Fri, Oct 16, 2020 at 05:18:26PM +0100, David Howells wrote:
> Hi Herbert, Dave, Trond,
> 
> I've written basic gssapi-derived security support for AF_RXRPC:
> 
>   
> https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-rxgk
> 
> I've borrowed some bits from net/sunrpc/auth_gss/ but there's a lot in there
> that is quite specific to the sunrpc module that makes it hard to use for
> rxrpc (dprintk, struct xdr_buf).
> 
> Further, I've implemented some more enctypes that aren't supported yet by
> gssapi (AES with sha256/sha384 and Camellia), and that requires some changes
> to the handling as AES with sha384 has a 24-byte checksum size and a 24-byte
> calculated key size for Kc and Ki but a 32-byte Ke.
> 
> Should I pull the core out and try to make it common?  If so, should I move it
> to crypto/ or lib/, or perhaps put it in net/gssapi/?
> 
> There are two components to it:
> 
>  (1) Key derivation steps.
> 
>  My thought is to use xdr_netobj or something similar for to communicate
>  between the steps (though I'd prefer to change .data to be a void* rather
>  than u8*).
> 
>  (2) Encryption/checksumming.
> 
>  My thought is to make this interface use scattergather lists[*] since
>  that's what the crypto encryption API requires (though not the hash API).
> 
> If I do this, should I create a "kerberos" crypto API for the data wrapping
> functions?  I'm not sure that it quite matches the existing APIs because the
> size of the input data will likely not match the size of the output data and
> it's "one shot" as it needs to deal with a checksum.
> 
> Or I can just keep my implementation separate inside net/rxrpc/.

I'd love to share whatever we can, though I'm not really sure what's
involved.  Certainly some careful testing at least.

It's been about 15 years since I really worked on that code.  I remember
struggling with struct xdr_buf.  The client and server support
zero-copy, so requests and replies are represented by a combination of a
couple of linear buffers plus an array of pages.  My memory is that the
(undocumented) meanings of the fields of the xdr_buf were different for
request and replies and for server and client, and getting them right
took some trial and error.

--b.


Re: [PATCH] SUNRPC: fix copying of multiple pages in gss_read_proxy_verf()

2020-10-19 Thread J. Bruce Fields
On Mon, Oct 19, 2020 at 01:42:27PM +0200, Martijn de Gouw wrote:
> When the passed token is longer than 4032 bytes, the remaining part
> of the token must be copied from the rqstp->rq_arg.pages. But the
> copy must make sure it happens in a consecutive way.

Thanks.  Apologies, but I don't immediately see where the copy is
non-consecutive.  What exactly is the bug in the existing code?

--b.

> 
> Signed-off-by: Martijn de Gouw 
> ---
>  net/sunrpc/auth_gss/svcauth_gss.c | 27 +--
>  1 file changed, 17 insertions(+), 10 deletions(-)
> 
> diff --git a/net/sunrpc/auth_gss/svcauth_gss.c 
> b/net/sunrpc/auth_gss/svcauth_gss.c
> index 258b04372f85..bd4678db9d76 100644
> --- a/net/sunrpc/auth_gss/svcauth_gss.c
> +++ b/net/sunrpc/auth_gss/svcauth_gss.c
> @@ -1147,9 +1147,9 @@ static int gss_read_proxy_verf(struct svc_rqst *rqstp,
>  struct gssp_in_token *in_token)
>  {
>   struct kvec *argv = >rq_arg.head[0];
> - unsigned int page_base, length;
> - int pages, i, res;
> - size_t inlen;
> + unsigned int length, pgto_offs, pgfrom_offs;
> + int pages, i, res, pgto, pgfrom;
> + size_t inlen, to_offs, from_offs;
>  
>   res = gss_read_common_verf(gc, argv, authp, in_handle);
>   if (res)
> @@ -1177,17 +1177,24 @@ static int gss_read_proxy_verf(struct svc_rqst *rqstp,
>   memcpy(page_address(in_token->pages[0]), argv->iov_base, length);
>   inlen -= length;
>  
> - i = 1;
> - page_base = rqstp->rq_arg.page_base;
> + to_offs = length;
> + from_offs = rqstp->rq_arg.page_base;
>   while (inlen) {
> - length = min_t(unsigned int, inlen, PAGE_SIZE);
> - memcpy(page_address(in_token->pages[i]),
> -page_address(rqstp->rq_arg.pages[i]) + page_base,
> + pgto = to_offs >> PAGE_SHIFT;
> + pgfrom = from_offs >> PAGE_SHIFT;
> + pgto_offs = to_offs & ~PAGE_MASK;
> + pgfrom_offs = from_offs & ~PAGE_MASK;
> +
> + length = min_t(unsigned int, inlen,
> +  min_t(unsigned int, PAGE_SIZE - pgto_offs,
> +PAGE_SIZE - pgfrom_offs));
> + memcpy(page_address(in_token->pages[pgto]) + pgto_offs,
> +page_address(rqstp->rq_arg.pages[pgfrom]) + pgfrom_offs,
>  length);
>  
> + to_offs += length;
> + from_offs += length;
>   inlen -= length;
> - page_base = 0;
> - i++;
>   }
>   return 0;
>  }
> -- 
> 2.20.1


Re: [PATCH v2 2/4] time: make getboottime64 aware of time namespace

2020-10-14 Thread J. Bruce Fields
On Mon, Oct 12, 2020 at 05:13:08PM -0400, J. Bruce Fields wrote:
> On Fri, Oct 09, 2020 at 10:08:15PM +0200, Thomas Gleixner wrote:
> > In fact the whole thing can be simplified. You can just use time in
> > nanoseconds retrieved via ktime_get_coarse_boottime() which does not
> > read the clocksource and advances once per tick and does not contain a
> > divison and is definitely faster than seconds_since_boot()
> > 
> > The expiry time is initialized via get_expiry() which does:
> > 
> >getboottime64();
> >return rv - boot.tv_sec; 
> > 
> > The expiry value 'rv' which is read out of the buffer is wall time in
> > seconds. So all you need is are function which convert real to boottime
> > and vice versa. That's trivial to implement and again faster than
> > getboottime64(). Something like this:
> > 
> > ktime_t ktime_real_to_boot(ktime_t real)
> > {
> > struct timekeeper *tk = _core.timekeeper;
> > ktime_t mono = ktime_sub(real, tk->offs_real);
> >   
> > return ktime_add(mono, tk->offs_boot);
> > }
> > 
> > so the above becomes:
> > 
> >return ktime_real_to_boot(rv * NSEC_PER_SEC);
> > 
> > which is still faster than a division.
> > 
> > The nanoseconds value after converting back to wall clock will need a
> > division to get seconds since the epoch, but that's not an issue because
> > that backward conversion already has one today.
> > 
> > You'd obviously need to fixup CACHE_NEW_EXPIRY and the other place which
> > add's '1' to the expiry value and some janitoring of function names and
> > variable types, but no real big surgery AFAICT.
> 
> I'll give it a shot.

I took your code above verbatim, but should I really be following the
example of ktime_mono_to_any()?  (With the seqlock, and maybe also the
more general prototype in case someone needs the other conversions.)--b.

commit eae2005cb432
Author: J. Bruce Fields 
Date:   Wed Oct 14 10:19:59 2020 -0400

sunrpc: simplify cache expiry times

We've been rolling our own conversion between wallclock and boot time in
get_expiry() and convert_to_wallclock().  Thomas Gleixner suggests it
would be simpler to use nanoseconds since boot stored in time_t
internally, and create common helpers for the conversion in
kernel/time/timekeeping.c.

Signed-off-by: J. Bruce Fields 

diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h
index 10891b70fc7b..10868e74d6e2 100644
--- a/include/linux/sunrpc/cache.h
+++ b/include/linux/sunrpc/cache.h
@@ -45,8 +45,8 @@
  */
 struct cache_head {
struct hlist_node   cache_list;
-   time64_texpiry_time;/* After time time, don't use the data 
*/
-   time64_tlast_refresh;   /* If CACHE_PENDING, this is when 
upcall was
+   ktime_t expiry_time;/* After time time, don't use the data 
*/
+   ktime_t last_refresh;   /* If CACHE_PENDING, this is when 
upcall was
 * sent, else this is when update was
 * received, though it is alway set to
 * be *after* ->flush_time.
@@ -59,7 +59,8 @@ struct cache_head {
 #defineCACHE_PENDING   2   /* An upcall has been sent but no reply 
received yet*/
 #defineCACHE_CLEANED   3   /* Entry has been cleaned from cache */
 
-#defineCACHE_NEW_EXPIRY 120/* keep new things pending confirmation 
for 120 seconds */
+#defineCACHE_NEW_EXPIRY (120*NSEC_PER_SEC)
+   /* keep new things pending confirmation for 120 seconds */
 
 struct cache_detail {
struct module * owner;
@@ -102,7 +103,7 @@ struct cache_detail {
 * than this.
 */
struct list_headothers;
-   time64_tnextcheck;
+   ktime_t nextcheck;
int entries;
 
/* fields for communication over channel */
@@ -159,11 +160,9 @@ static inline time64_t seconds_since_boot(void)
return ktime_get_real_seconds() - boot.tv_sec;
 }
 
-static inline time64_t convert_to_wallclock(time64_t sinceboot)
+static inline time64_t convert_to_wallclock(ktime_t sinceboot)
 {
-   struct timespec64 boot;
-   getboottime64();
-   return boot.tv_sec + sinceboot;
+   return ktime_boot_to_real(sinceboot) / NSEC_PER_SEC;
 }
 
 extern const struct file_operations cache_file_operations_pipefs;
@@ -298,17 +297,15 @@ static inline int get_time(char **bpp, time64_t *time)
return 0;
 }
 
-static inline time64_t get_expiry(char **bpp)
+static inline ktime_t get_expiry(char **bp

Re: [PATCH v2 2/4] time: make getboottime64 aware of time namespace

2020-10-12 Thread J. Bruce Fields
On Fri, Oct 09, 2020 at 10:08:15PM +0200, Thomas Gleixner wrote:
> On Fri, Oct 09 2020 at 09:55, J. Bruce Fields wrote:
> > Looking at how it's used in net/sunrpc/cache.c  All it's doing is
> > comparing times which have all been calculated relative to the time
> > returned by getboottime64().  So it doesn't really matter what
> > getboottime64() is, as long as it's always the same.
> >
> > So, I don't think this should change behavior of the sunrpc code at all.
> 
> You wish. That's clearly wrong because that code is not guaranteed to
> always run in a context which belongs to the root time namespace.

Argh, right, thanks.

> AFAICT, this stuff can run in softirq context which is context stealing
> and the interrupted task can belong to a different time name space.

Some of it runs in the context of a process doing IO to proc, some from
kthreads.  So, anyway, yes, it's not consistent in the way we'd need.

> In fact the whole thing can be simplified. You can just use time in
> nanoseconds retrieved via ktime_get_coarse_boottime() which does not
> read the clocksource and advances once per tick and does not contain a
> divison and is definitely faster than seconds_since_boot()
> 
> The expiry time is initialized via get_expiry() which does:
> 
>getboottime64();
>return rv - boot.tv_sec; 
> 
> The expiry value 'rv' which is read out of the buffer is wall time in
> seconds. So all you need is are function which convert real to boottime
> and vice versa. That's trivial to implement and again faster than
> getboottime64(). Something like this:
> 
> ktime_t ktime_real_to_boot(ktime_t real)
> {
> struct timekeeper *tk = _core.timekeeper;
> ktime_t mono = ktime_sub(real, tk->offs_real);
>   
> return ktime_add(mono, tk->offs_boot);
> }
> 
> so the above becomes:
> 
>return ktime_real_to_boot(rv * NSEC_PER_SEC);
> 
> which is still faster than a division.
> 
> The nanoseconds value after converting back to wall clock will need a
> division to get seconds since the epoch, but that's not an issue because
> that backward conversion already has one today.
> 
> You'd obviously need to fixup CACHE_NEW_EXPIRY and the other place which
> add's '1' to the expiry value and some janitoring of function names and
> variable types, but no real big surgery AFAICT.

I'll give it a shot.

Thanks so much for taking a careful look at this.

--b.


Re: [PATCH] nfsd: remove unneeded break

2020-10-12 Thread J. Bruce Fields
On Sun, Oct 11, 2020 at 08:51:55AM -0700, t...@redhat.com wrote:
> From: Tom Rix 
> 
> Because every path through nfs4_find_file()'s
> switch does an explicit return, the break is not needed.

OK, applying.--b.

> 
> Signed-off-by: Tom Rix 
> ---
>  fs/nfsd/nfs4state.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index c09a2a4281ec..741f64672d96 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -5722,7 +5722,6 @@ nfs4_find_file(struct nfs4_stid *s, int flags)
>   return find_readable_file(s->sc_file);
>   else
>   return find_writeable_file(s->sc_file);
> - break;
>   }
>  
>   return NULL;
> -- 
> 2.18.1


Re: [PATCH v2 2/4] time: make getboottime64 aware of time namespace

2020-10-09 Thread J. Bruce Fields
On Fri, Oct 09, 2020 at 03:28:15PM +0200, Christian Brauner wrote:
> On Thu, Oct 08, 2020 at 07:39:42AM +0200, Michael Weiß wrote:
> > getboottime64() provides the time stamp of system boot. In case of
> > time namespaces,

Huh, I didn't know there were time namespaces.

> > the offset to the boot time stamp was not applied
> > earlier. However, getboottime64 is used e.g., in /proc/stat to print
> > the system boot time to userspace. In container runtimes which utilize
> > time namespaces to virtualize boottime of a container, this leaks
> > information about the host system boot time.
> > 
> > Therefore, we make getboottime64() to respect the time namespace offset
> > for boottime by subtracting the boottime offset.
> > 
> > Signed-off-by: Michael Weiß 
> > ---
> >  kernel/time/timekeeping.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> > index 4c47f388a83f..67530cdb389e 100644
> > --- a/kernel/time/timekeeping.c
> > +++ b/kernel/time/timekeeping.c
> > @@ -17,6 +17,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -2154,6 +2155,8 @@ void getboottime64(struct timespec64 *ts)
> >  {
> > struct timekeeper *tk = _core.timekeeper;
> > ktime_t t = ktime_sub(tk->offs_real, tk->offs_boot);
> > +   /* shift boot time stamp according to the timens offset */
> > +   t = timens_ktime_to_host(CLOCK_BOOTTIME, t);
> 
> Note that getbootime64() is mostly used in net/sunrpc and I don't know
> if this change has any security implications for them.
> 
> Hey, Trond, Anna, Bruce, and Chuck this virtualizes boottime according
> to the time namespace of the caller, i.e. a container can e.g. reset
> it's boottime when started. This is already possible. The series here
> fixes a bug where /proc/stat's btime field is not virtualized but since
> this changes getboottime64() this would also apply to sunrpc's
> timekeeping. Is that ok or does sunrpc rely on the hosts's boot time,
> i.e. the time in the initial time namespace?

Looking at how it's used in net/sunrpc/cache.c  All it's doing is
comparing times which have all been calculated relative to the time
returned by getboottime64().  So it doesn't really matter what
getboottime64() is, as long as it's always the same.

So, I don't think this should change behavior of the sunrpc code at all.

--b.


Re: general protection fault in cache_clean

2020-09-16 Thread J. Bruce Fields
On Tue, Sep 15, 2020 at 01:04:20AM -0700, syzbot wrote:
> syzbot found the following issue on:
> 
> HEAD commit:581cb3a2 Merge tag 'f2fs-for-5.9-rc5' of git://git.kernel...
> git tree:   upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=11f5c01190
> kernel config:  https://syzkaller.appspot.com/x/.config?x=a9075b36a6ae26c9
> dashboard link: https://syzkaller.appspot.com/bug?extid=1594adb1b44e354153d8
> compiler:   gcc (GCC) 10.1.0-syz 20200507
> 
> Unfortunately, I don't have any reproducer for this issue yet.
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+1594adb1b44e35415...@syzkaller.appspotmail.com
> 
> general protection fault, probably for non-canonical address 
> 0xdc0012e34a9a:  [#1] PREEMPT SMP KASAN
> KASAN: probably user-memory-access in range 
> [0x971a54d0-0x971a54d7]
> CPU: 1 PID: 19990 Comm: kworker/1:11 Not tainted 5.9.0-rc4-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 01/01/2011
> Workqueue: events_power_efficient do_cache_clean
> RIP: 0010:cache_clean+0x119/0x7f0 net/sunrpc/cache.c:444

That's in cache_clean():
spin_lock(_list_lock);
...
current_detail = list_entry(next, struct cache_detail, others)
444:if (current_detail->nextcheck > seconds_since_boot())

It suggests cache_list or current_detail (both globals) are corrupted
somehow.

Those are manipulated only by cache_clean() and
sunrpc_{init,destroy}_cache_detail(), always under the cache_list_lock.

All the callers have to do to get this right is make sure the
cache_detail they pass in is allocated before calling
sunrpc_init_cache_detail() and not freed till after calling
sunrpc_destroy_cache_detail().  I think they all get that right.

So I'm assuming this is a random memory scribble from somewhere else or
something, unless it pops up again

(The one thing I'm a little unsure of here is the
list_empty(_list) checks used to decide when to stop the
cache_cleaner.  But that's a separate problem, if it is a problem.)

--b.


> Code: 81 fb 20 eb 94 8a 0f 84 b8 00 00 00 e8 80 df 33 fa 48 8d 83 40 ff ff ff 
> 48 8d 7b 10 48 89 05 8e 8e 13 06 48 89 f8 48 c1 e8 03 <42> 80 3c 28 00 0f 85 
> e0 05 00 00 48 8d 6c 24 38 4c 8b 63 10 48 89
> RSP: 0018:c90008e1fc48 EFLAGS: 00010206
> RAX: 12e34a9a RBX: 971a54c0 RCX: 87406dbb
> RDX: 88804358a000 RSI: 87406e00 RDI: 971a54d0
> RBP: 0100 R08: 0001 R09: 0003
> R10: 0100 R11:  R12: 0100
> R13: dc00 R14: 88803451b200 R15: 8880ae735600
> FS:  () GS:8880ae70() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 004ef310 CR3: 9ca1b000 CR4: 001526e0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
>  do_cache_clean+0xd/0xd0 net/sunrpc/cache.c:502
>  process_one_work+0x94c/0x1670 kernel/workqueue.c:2269
>  worker_thread+0x64c/0x1120 kernel/workqueue.c:2415
>  kthread+0x3b5/0x4a0 kernel/kthread.c:292
>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
> Modules linked in:
> ---[ end trace 4c54bbd0e20d734b ]---
> RIP: 0010:cache_clean+0x119/0x7f0 net/sunrpc/cache.c:444
> Code: 81 fb 20 eb 94 8a 0f 84 b8 00 00 00 e8 80 df 33 fa 48 8d 83 40 ff ff ff 
> 48 8d 7b 10 48 89 05 8e 8e 13 06 48 89 f8 48 c1 e8 03 <42> 80 3c 28 00 0f 85 
> e0 05 00 00 48 8d 6c 24 38 4c 8b 63 10 48 89
> RSP: 0018:c90008e1fc48 EFLAGS: 00010206
> RAX: 12e34a9a RBX: 971a54c0 RCX: 87406dbb
> RDX: 88804358a000 RSI: 87406e00 RDI: 971a54d0
> RBP: 0100 R08: 0001 R09: 0003
> R10: 0100 R11:  R12: 0100
> R13: dc00 R14: 88803451b200 R15: 8880ae735600
> FS:  () GS:8880ae70() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 004ef310 CR3: 9ca1b000 CR4: 001526e0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> 
> 
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkal...@googlegroups.com.
> 
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.


Re: [PATCH] nfs: Fix security label length not being reset

2020-09-15 Thread J. Bruce Fields
On Mon, Sep 14, 2020 at 10:49:57AM -0500, Jeffrey Mitchell wrote:
> nfs_readdir_page_filler() iterates over entries in a directory, reusing
> the same security label buffer, but does not reset the buffer's length.
> This causes decode_attr_security_label() to return -ERANGE if an entry's
> security label is longer than the previous one's. This error, in
> nfs4_decode_dirent(), only gets passed up as -EAGAIN, which causes another
> failed attempt to copy into the buffer. The second error is ignored and
> the remaining entries do not show up in ls, specifically the getdents64()
> syscall.
> 
> Reproduce by creating multiple files in NFS and giving one of the later
> files a longer security label. ls will not see that file nor any that are
> added afterwards, though they will exist on the backend.

Please include these paragraphs in the changelog.

--b.

> 
> - Jeffrey
> 
> Jeffrey Mitchell (1):
>   nfs: Fix security label length not being reset
> 
>  fs/nfs/dir.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> -- 
> 2.25.1


Re: [PATCH] silence nfscache allocation warnings with kvzalloc

2020-09-14 Thread J. Bruce Fields
Applying, thanks.--b.

On Mon, Sep 14, 2020 at 01:07:19PM -0400, Rik van Riel wrote:
> silence nfscache allocation warnings with kvzalloc
> 
> Currently nfsd_reply_cache_init attempts hash table allocation through
> kmalloc, and manually falls back to vzalloc if that fails. This makes
> the code a little larger than needed, and creates a significant amount
> of serial console spam if you have enough systems.
> 
> Switching to kvzalloc gets rid of the allocation warnings, and makes
> the code a little cleaner too as a side effect.
> 
> Freeing of nn->drc_hashtbl is already done using kvfree currently.
> 
> Signed-off-by: Rik van Riel 
> ---
> diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
> index 96352ab7bd81..5125b5ef25b6 100644
> --- a/fs/nfsd/nfscache.c
> +++ b/fs/nfsd/nfscache.c
> @@ -164,14 +164,10 @@ int nfsd_reply_cache_init(struct nfsd_net *nn)
>   if (!nn->drc_slab)
>   goto out_shrinker;
>  
> - nn->drc_hashtbl = kcalloc(hashsize,
> - sizeof(*nn->drc_hashtbl), GFP_KERNEL);
> - if (!nn->drc_hashtbl) {
> - nn->drc_hashtbl = vzalloc(array_size(hashsize,
> -  sizeof(*nn->drc_hashtbl)));
> - if (!nn->drc_hashtbl)
> - goto out_slab;
> - }
> + nn->drc_hashtbl = kvzalloc(array_size(hashsize,
> +sizeof(*nn->drc_hashtbl)), GFP_KERNEL);
> + if (!nn->drc_hashtbl)
> + goto out_slab;
>  
>   for (i = 0; i < hashsize; i++) {
>   INIT_LIST_HEAD(>drc_hashtbl[i].lru_head);


Re: [PATCH] gss_krb5: Fix memleak in krb5_make_rc4_seq_num

2020-08-29 Thread J. Bruce Fields
This code is rarely if ever used, and there are pending patches to
remove it completely, so I don't think it's worth trying to fix a rare
memory leak at this point.

--b.

On Thu, Aug 27, 2020 at 04:02:50PM +0800, Dinghao Liu wrote:
> When kmalloc() fails, cipher should be freed
> just like when krb5_rc4_setup_seq_key() fails.
> 
> Fixes: e7afe6c1d486b ("sunrpc: fix 4 more call sites that were using stack 
> memory with a scatterlist")
> Signed-off-by: Dinghao Liu 
> ---
>  net/sunrpc/auth_gss/gss_krb5_seqnum.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/net/sunrpc/auth_gss/gss_krb5_seqnum.c 
> b/net/sunrpc/auth_gss/gss_krb5_seqnum.c
> index 507105127095..88ca58d11082 100644
> --- a/net/sunrpc/auth_gss/gss_krb5_seqnum.c
> +++ b/net/sunrpc/auth_gss/gss_krb5_seqnum.c
> @@ -53,8 +53,10 @@ krb5_make_rc4_seq_num(struct krb5_ctx *kctx, int 
> direction, s32 seqnum,
>   return PTR_ERR(cipher);
>  
>   plain = kmalloc(8, GFP_NOFS);
> - if (!plain)
> - return -ENOMEM;
> + if (!plain) {
> + code = -ENOMEM;
> + goto out;
> + }
>  
>   plain[0] = (unsigned char) ((seqnum >> 24) & 0xff);
>   plain[1] = (unsigned char) ((seqnum >> 16) & 0xff);
> -- 
> 2.17.1


Re: [PATCH 1/2] nfsd: Remove unnecessary assignment in nfs4xdr.c

2020-08-26 Thread J. Bruce Fields
On Fri, Aug 21, 2020 at 12:37:45AM +0100, Alex Dewar wrote:
> On Wed, Aug 12, 2020 at 08:36:31PM +, Frank van der Linden wrote:
> > On Wed, Aug 12, 2020 at 03:12:51PM +0100, Alex Dewar wrote:
> > > 
> > > In nfsd4_encode_listxattrs(), the variable p is assigned to at one point
> > > but this value is never used before p is reassigned. Fix this.
> > > 
> > > Addresses-Coverity: ("Unused value")
> > > Signed-off-by: Alex Dewar 
> > > ---
> > >  fs/nfsd/nfs4xdr.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> > > index 259d5ad0e3f47..1a0341fd80f9a 100644
> > > --- a/fs/nfsd/nfs4xdr.c
> > > +++ b/fs/nfsd/nfs4xdr.c
> > > @@ -4859,7 +4859,7 @@ nfsd4_encode_listxattrs(struct nfsd4_compoundres 
> > > *resp, __be32 nfserr,
> > > goto out;
> > > }
> > > 
> > > -   p = xdr_encode_opaque(p, sp, slen);
> > > +   xdr_encode_opaque(p, sp, slen);
> > > 
> > > xdrleft -= xdrlen;
> > > count++;
> > > --
> > > 2.28.0
> > > 
> > 
> > Yep, I guess my linting missed that, thanks for the fix.
> > 
> > - Frank
> 
> Ping? The second patch in this series had a mistake in it, but I think
> this one's still good to go :-)

Thanks, applied for 5.10.--b.


Re: [PATCH] nfsd: Fix typo in comment

2020-08-20 Thread J. Bruce Fields
Applying, thanks.--b.

On Mon, Aug 17, 2020 at 06:51:26PM +0100, Alex Dewar wrote:
> Missing "is".
> 
> Signed-off-by: Alex Dewar 
> ---
> Ahh I see. Is this better?
> ---
>  fs/nfsd/nfs4xdr.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index 259d5ad0e3f47..309a6d5f895ae 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -4828,7 +4828,7 @@ nfsd4_encode_listxattrs(struct nfsd4_compoundres *resp, 
> __be32 nfserr,
>   slen = strlen(sp);
>  
>   /*
> -  * Check if this a user. attribute, skip it if not.
> +  * Check if this is a user. attribute, skip it if not.
>*/
>   if (strncmp(sp, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN))
>   goto contloop;
> -- 
> 2.28.0


[GIT PULL] nfsd bugfix for 5.8

2020-07-24 Thread J. Bruce Fields
Please pull:

  git://linux-nfs.org/~bfields/linux.git tags/nfsd-5.8-2

This is just one fix for a NULL dereference if someone happens to read
/proc/fs/nfsd/client/../state at the wrong moment.

--b.

J. Bruce Fields (1):
  nfsd4: fix NULL dereference in nfsd/clients display code

 fs/nfsd/nfs4state.c | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)


Re: [PATCH 5.4 35/65] nfsd: clients dont need to break their own delegations

2020-07-07 Thread J. Bruce Fields
On Tue, Jul 07, 2020 at 01:29:30PM -0400, Sasha Levin wrote:
> On Tue, Jul 07, 2020 at 01:20:51PM -0400, Sasha Levin wrote:
> > On Tue, Jul 07, 2020 at 11:31:22AM -0400, J. Bruce Fields wrote:
> > > NACK.
> > > 
> > > (How did this one even end up headed for stable?  It wasn't cc'd to
> > 
> > It came in when I was looking at the later nfs patches in this series
> > and figured it is a fix on its own.
> > 
> > > stable, it's not a bugfix, and it's not a small patch.)
> > 
> > If its not a bugfix, why did it go in -rc4 rather than waiting for the
> > merge window?

They went in during the merge window, unless I'm very confused.

> Sorry, my bad, I failed at shuffling patches around. I'll drop these.

OK, thanks.

--b.



Re: [PATCH 5.7 070/112] kthread: save thread function

2020-07-07 Thread J. Bruce Fields
NACK to this and following patch.--b.

On Tue, Jul 07, 2020 at 05:17:15PM +0200, Greg Kroah-Hartman wrote:
> From: J. Bruce Fields 
> 
> [ Upstream commit 52782c92ac85c4e393eb4a903a62e6c24afa633f ]
> 
> It's handy to keep the kthread_fn just as a unique cookie to identify
> classes of kthreads.  E.g. if you can verify that a given task is
> running your thread_fn, then you may know what sort of type kthread_data
> points to.
> 
> We'll use this in nfsd to pass some information into the vfs.  Note it
> will need kthread_data() exported too.
> 
> Original-patch-by: Tejun Heo 
> Signed-off-by: J. Bruce Fields 
> Signed-off-by: Sasha Levin 
> ---
>  include/linux/kthread.h |  1 +
>  kernel/kthread.c| 17 +
>  2 files changed, 18 insertions(+)
> 
> diff --git a/include/linux/kthread.h b/include/linux/kthread.h
> index 8bbcaad7ef0f4..c2a274b79c429 100644
> --- a/include/linux/kthread.h
> +++ b/include/linux/kthread.h
> @@ -57,6 +57,7 @@ bool kthread_should_stop(void);
>  bool kthread_should_park(void);
>  bool __kthread_should_park(struct task_struct *k);
>  bool kthread_freezable_should_stop(bool *was_frozen);
> +void *kthread_func(struct task_struct *k);
>  void *kthread_data(struct task_struct *k);
>  void *kthread_probe_data(struct task_struct *k);
>  int kthread_park(struct task_struct *k);
> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index bfbfa481be3a5..b84fc7eec0358 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -46,6 +46,7 @@ struct kthread_create_info
>  struct kthread {
>   unsigned long flags;
>   unsigned int cpu;
> + int (*threadfn)(void *);
>   void *data;
>   struct completion parked;
>   struct completion exited;
> @@ -152,6 +153,20 @@ bool kthread_freezable_should_stop(bool *was_frozen)
>  }
>  EXPORT_SYMBOL_GPL(kthread_freezable_should_stop);
>  
> +/**
> + * kthread_func - return the function specified on kthread creation
> + * @task: kthread task in question
> + *
> + * Returns NULL if the task is not a kthread.
> + */
> +void *kthread_func(struct task_struct *task)
> +{
> + if (task->flags & PF_KTHREAD)
> + return to_kthread(task)->threadfn;
> + return NULL;
> +}
> +EXPORT_SYMBOL_GPL(kthread_func);
> +
>  /**
>   * kthread_data - return data value specified on kthread creation
>   * @task: kthread task in question
> @@ -164,6 +179,7 @@ void *kthread_data(struct task_struct *task)
>  {
>   return to_kthread(task)->data;
>  }
> +EXPORT_SYMBOL_GPL(kthread_data);
>  
>  /**
>   * kthread_probe_data - speculative version of kthread_data()
> @@ -244,6 +260,7 @@ static int kthread(void *_create)
>   do_exit(-ENOMEM);
>   }
>  
> + self->threadfn = threadfn;
>   self->data = data;
>   init_completion(>exited);
>   init_completion(>parked);
> -- 
> 2.25.1
> 
> 
> 



Re: [PATCH 5.4 34/65] kthread: save thread function

2020-07-07 Thread J. Bruce Fields
NACK.--b.

On Tue, Jul 07, 2020 at 05:17:13PM +0200, Greg Kroah-Hartman wrote:
> From: J. Bruce Fields 
> 
> [ Upstream commit 52782c92ac85c4e393eb4a903a62e6c24afa633f ]
> 
> It's handy to keep the kthread_fn just as a unique cookie to identify
> classes of kthreads.  E.g. if you can verify that a given task is
> running your thread_fn, then you may know what sort of type kthread_data
> points to.
> 
> We'll use this in nfsd to pass some information into the vfs.  Note it
> will need kthread_data() exported too.
> 
> Original-patch-by: Tejun Heo 
> Signed-off-by: J. Bruce Fields 
> Signed-off-by: Sasha Levin 
> ---
>  include/linux/kthread.h |  1 +
>  kernel/kthread.c| 17 +
>  2 files changed, 18 insertions(+)
> 
> diff --git a/include/linux/kthread.h b/include/linux/kthread.h
> index 0f9da966934e2..59bbc63ff8637 100644
> --- a/include/linux/kthread.h
> +++ b/include/linux/kthread.h
> @@ -57,6 +57,7 @@ bool kthread_should_stop(void);
>  bool kthread_should_park(void);
>  bool __kthread_should_park(struct task_struct *k);
>  bool kthread_freezable_should_stop(bool *was_frozen);
> +void *kthread_func(struct task_struct *k);
>  void *kthread_data(struct task_struct *k);
>  void *kthread_probe_data(struct task_struct *k);
>  int kthread_park(struct task_struct *k);
> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index b262f47046ca4..543dff6b576c7 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -46,6 +46,7 @@ struct kthread_create_info
>  struct kthread {
>   unsigned long flags;
>   unsigned int cpu;
> + int (*threadfn)(void *);
>   void *data;
>   struct completion parked;
>   struct completion exited;
> @@ -152,6 +153,20 @@ bool kthread_freezable_should_stop(bool *was_frozen)
>  }
>  EXPORT_SYMBOL_GPL(kthread_freezable_should_stop);
>  
> +/**
> + * kthread_func - return the function specified on kthread creation
> + * @task: kthread task in question
> + *
> + * Returns NULL if the task is not a kthread.
> + */
> +void *kthread_func(struct task_struct *task)
> +{
> + if (task->flags & PF_KTHREAD)
> + return to_kthread(task)->threadfn;
> + return NULL;
> +}
> +EXPORT_SYMBOL_GPL(kthread_func);
> +
>  /**
>   * kthread_data - return data value specified on kthread creation
>   * @task: kthread task in question
> @@ -164,6 +179,7 @@ void *kthread_data(struct task_struct *task)
>  {
>   return to_kthread(task)->data;
>  }
> +EXPORT_SYMBOL_GPL(kthread_data);
>  
>  /**
>   * kthread_probe_data - speculative version of kthread_data()
> @@ -237,6 +253,7 @@ static int kthread(void *_create)
>   do_exit(-ENOMEM);
>   }
>  
> + self->threadfn = threadfn;
>   self->data = data;
>   init_completion(>exited);
>   init_completion(>parked);
> -- 
> 2.25.1
> 
> 
> 



Re: [PATCH 5.4 35/65] nfsd: clients dont need to break their own delegations

2020-07-07 Thread J. Bruce Fields
NACK.

(How did this one even end up headed for stable?  It wasn't cc'd to
stable, it's not a bugfix, and it's not a small patch.)

--b.

On Tue, Jul 07, 2020 at 05:17:14PM +0200, Greg Kroah-Hartman wrote:
> From: J. Bruce Fields 
> 
> [ Upstream commit 28df3d1539de5090f7916f6fff03891b67f366f4 ]
> 
> We currently revoke read delegations on any write open or any operation
> that modifies file data or metadata (including rename, link, and
> unlink).  But if the delegation in question is the only read delegation
> and is held by the client performing the operation, that's not really
> necessary.
> 
> It's not always possible to prevent this in the NFSv4.0 case, because
> there's not always a way to determine which client an NFSv4.0 delegation
> came from.  (In theory we could try to guess this from the transport
> layer, e.g., by assuming all traffic on a given TCP connection comes
> from the same client.  But that's not really correct.)
> 
> In the NFSv4.1 case the session layer always tells us the client.
> 
> This patch should remove such self-conflicts in all cases where we can
> reliably determine the client from the compound.
> 
> To do that we need to track "who" is performing a given (possibly
> lease-breaking) file operation.  We're doing that by storing the
> information in the svc_rqst and using kthread_data() to map the current
> task back to a svc_rqst.
> 
> Signed-off-by: J. Bruce Fields 
> Signed-off-by: Sasha Levin 
> ---
>  Documentation/filesystems/locking.rst |  2 ++
>  fs/locks.c|  3 +++
>  fs/nfsd/nfs4proc.c|  2 ++
>  fs/nfsd/nfs4state.c   | 14 ++
>  fs/nfsd/nfsd.h|  2 ++
>  fs/nfsd/nfssvc.c  |  6 ++
>  include/linux/fs.h|  1 +
>  include/linux/sunrpc/svc.h|  1 +
>  8 files changed, 31 insertions(+)
> 
> diff --git a/Documentation/filesystems/locking.rst 
> b/Documentation/filesystems/locking.rst
> index fc3a0704553cf..b5f8d15a30fb7 100644
> --- a/Documentation/filesystems/locking.rst
> +++ b/Documentation/filesystems/locking.rst
> @@ -425,6 +425,7 @@ prototypes::
>   int (*lm_grant)(struct file_lock *, struct file_lock *, int);
>   void (*lm_break)(struct file_lock *); /* break_lease callback */
>   int (*lm_change)(struct file_lock **, int);
> + bool (*lm_breaker_owns_lease)(struct file_lock *);
>  
>  locking rules:
>  
> @@ -435,6 +436,7 @@ lm_notify:yes yes 
> no
>  lm_grant:no  no  no
>  lm_break:yes no  no
>  lm_changeyes no  no
> +lm_breaker_owns_lease:   no  no  no
>  ==   =   =   =
>  
>  buffer_head
> diff --git a/fs/locks.c b/fs/locks.c
> index b8a31c1c4fff3..a3f186846e93e 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -1557,6 +1557,9 @@ static bool leases_conflict(struct file_lock *lease, 
> struct file_lock *breaker)
>  {
>   bool rc;
>  
> + if (lease->fl_lmops->lm_breaker_owns_lease
> + && lease->fl_lmops->lm_breaker_owns_lease(lease))
> + return false;
>   if ((breaker->fl_flags & FL_LAYOUT) != (lease->fl_flags & FL_LAYOUT)) {
>   rc = false;
>   goto trace;
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 4798667af647c..96fa2837d3cfb 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -1961,6 +1961,8 @@ nfsd4_proc_compound(struct svc_rqst *rqstp)
>   goto encode_op;
>   }
>  
> + rqstp->rq_lease_breaker = (void **)>clp;
> +
>   trace_nfsd_compound(rqstp, args->opcnt);
>   while (!status && resp->opcnt < args->opcnt) {
>   op = >ops[resp->opcnt++];
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 8650a97e2ba96..1e8f5e281bb53 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -4464,6 +4464,19 @@ nfsd_break_deleg_cb(struct file_lock *fl)
>   return ret;
>  }
>  
> +static bool nfsd_breaker_owns_lease(struct file_lock *fl)
> +{
> + struct nfs4_delegation *dl = fl->fl_owner;
> + struct svc_rqst *rqst;
> + struct nfs4_client *clp;
> +
> + if (!i_am_nfsd())
> + return NULL;
> + rqst = kthread_data(current);
> + clp = *(rqst->rq_lease_breaker);
> + return dl->dl_stid.sc_client == clp;
> +}
> +
>  static int
>  nfsd_ch

Re: [RFC PATCH 3/7] SUNRPC: remove RC4-HMAC-MD5 support from KerberosV

2020-07-02 Thread J. Bruce Fields
Acked-by: J. Bruce Fields 

On Thu, Jul 02, 2020 at 12:19:43PM +0200, Ard Biesheuvel wrote:
> The RC4-HMAC-MD5 KerberosV algorithm is based on RFC 4757 [0], which
> was specifically issued for interoperability with Windows 2000, but was
> never intended to receive the same level of support. The RFC says
> 
>   The IETF Kerberos community supports publishing this specification as
>   an informational document in order to describe this widely
>   implemented technology.  However, while these encryption types
>   provide the operations necessary to implement the base Kerberos
>   specification [RFC4120], they do not provide all the required
>   operations in the Kerberos cryptography framework [RFC3961].  As a
>   result, it is not generally possible to implement potential
>   extensions to Kerberos using these encryption types.  The Kerberos
>   encryption type negotiation mechanism [RFC4537] provides one approach
>   for using such extensions even when a Kerberos infrastructure uses
>   long-term RC4 keys.  Because this specification does not implement
>   operations required by RFC 3961 and because of security concerns with
>   the use of RC4 and MD4 discussed in Section 8, this specification is
>   not appropriate for publication on the standards track.
> 
>   The RC4-HMAC encryption types are used to ease upgrade of existing
>   Windows NT environments, provide strong cryptography (128-bit key
>   lengths), and provide exportable (meet United States government
>   export restriction requirements) encryption.  This document describes
>   the implementation of those encryption types.
> 
> Furthermore, this RFC was re-classified as 'historic' by RFC 8429 [1] in
> 2018, stating that 'none of the encryption types it specifies should be
> used'
> 
> Note that other outdated algorithms are left in place (some of which are
> guarded by CONFIG_SUNRPC_DISABLE_INSECURE_ENCTYPES), so this should only
> adversely affect interoperability with Windows NT/2000 systems that have
> not received any updates since 2008 (but are connected to a network
> nonetheless)
> 
> [0] https://tools.ietf.org/html/rfc4757
> [1] https://tools.ietf.org/html/rfc8429
> 
> Signed-off-by: Ard Biesheuvel 
> ---
>  include/linux/sunrpc/gss_krb5.h  |  11 -
>  include/linux/sunrpc/gss_krb5_enctypes.h |   9 +-
>  net/sunrpc/Kconfig   |   1 -
>  net/sunrpc/auth_gss/gss_krb5_crypto.c| 276 
>  net/sunrpc/auth_gss/gss_krb5_mech.c  |  95 ---
>  net/sunrpc/auth_gss/gss_krb5_seal.c  |   1 -
>  net/sunrpc/auth_gss/gss_krb5_seqnum.c|  87 --
>  net/sunrpc/auth_gss/gss_krb5_unseal.c|   1 -
>  net/sunrpc/auth_gss/gss_krb5_wrap.c  |  65 +
>  9 files changed, 16 insertions(+), 530 deletions(-)
> 
> diff --git a/include/linux/sunrpc/gss_krb5.h b/include/linux/sunrpc/gss_krb5.h
> index e8f8ffe7448b..91f43d86879d 100644
> --- a/include/linux/sunrpc/gss_krb5.h
> +++ b/include/linux/sunrpc/gss_krb5.h
> @@ -141,14 +141,12 @@ enum sgn_alg {
>   SGN_ALG_MD2_5 = 0x0001,
>   SGN_ALG_DES_MAC = 0x0002,
>   SGN_ALG_3 = 0x0003, /* not published */
> - SGN_ALG_HMAC_MD5 = 0x0011,  /* microsoft w2k; no support */
>   SGN_ALG_HMAC_SHA1_DES3_KD = 0x0004
>  };
>  enum seal_alg {
>   SEAL_ALG_NONE = 0x,
>   SEAL_ALG_DES = 0x,
>   SEAL_ALG_1 = 0x0001,/* not published */
> - SEAL_ALG_MICROSOFT_RC4 = 0x0010,/* microsoft w2k; no support */
>   SEAL_ALG_DES3KD = 0x0002
>  };
>  
> @@ -316,14 +314,5 @@ gss_krb5_aes_decrypt(struct krb5_ctx *kctx, u32 offset, 
> u32 len,
>struct xdr_buf *buf, u32 *plainoffset,
>u32 *plainlen);
>  
> -int
> -krb5_rc4_setup_seq_key(struct krb5_ctx *kctx,
> -struct crypto_sync_skcipher *cipher,
> -unsigned char *cksum);
> -
> -int
> -krb5_rc4_setup_enc_key(struct krb5_ctx *kctx,
> -struct crypto_sync_skcipher *cipher,
> -s32 seqnum);
>  void
>  gss_krb5_make_confounder(char *p, u32 conflen);
> diff --git a/include/linux/sunrpc/gss_krb5_enctypes.h 
> b/include/linux/sunrpc/gss_krb5_enctypes.h
> index 981c89cef19d..87eea679d750 100644
> --- a/include/linux/sunrpc/gss_krb5_enctypes.h
> +++ b/include/linux/sunrpc/gss_krb5_enctypes.h
> @@ -13,15 +13,13 @@
>  #ifdef CONFIG_SUNRPC_DISABLE_INSECURE_ENCTYPES
>  
>  /*
> - * NB: This list includes encryption types that were deprecated
> - * by RFC 8429 (DES3_CBC_SHA1 and ARCFOUR_HMAC).
> + * NB: This list includes DES3_CBC_SHA1, which was deprecated by RFC 8429.
>   *
>   * ENCTYPE_AES256_CTS_HMAC_SHA1_96
>   * ENCTYPE_AES128_

[GIT PULL] nfsd bugfixes for 5.8

2020-07-02 Thread J. Bruce Fields
Please pull

  git://linux-nfs.org/~bfields/linux.git tags/nfsd-5.8-1

Fixes for a umask bug on exported filesystems lacking ACL support, a
leak and a module unloading bug in the /proc/fs/nfsd/clients/ code, and
a compile warning.

--b.


Christophe Leroy (1):
  SUNRPC: Add missing definition of ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE

J. Bruce Fields (3):
  nfsd: apply umask on fs without ACL support
  nfsd4: fix nfsdfs reference count loop
  nfsd: fix nfsdfs inode reference count leak

 fs/nfsd/nfs4state.c  |  8 +++-
 fs/nfsd/nfsctl.c | 23 +--
 fs/nfsd/nfsd.h   |  3 +++
 fs/nfsd/vfs.c|  6 ++
 net/sunrpc/svcsock.c |  1 +
 5 files changed, 30 insertions(+), 11 deletions(-)


Re: linux-next: Fixes tag needs some work in the nfsd tree

2020-06-29 Thread J. Bruce Fields
On Sat, Jun 27, 2020 at 09:03:17AM +1000, Stephen Rothwell wrote:
> Hi all,
> 
> In commit
> 
>   886c4fe8bdff ("nfsd4: fix nfsdfs reference count loop")
> 
> Fixes tag
> 
>   Fixes: 2c830dd720 ("nfsd: persist nfsd filesystem across mounts")
> 
> has these problem(s):
> 
>   - SHA1 should be at least 12 digits long

OK, should be fixed.--b.

> Can be fixed by setting core.abbrev to 12 (or more) or (for git v2.11
> or later) just making sure it is not set (or set to "auto").
> 
> -- 
> Cheers,
> Stephen Rothwell




Re: BUG: unable to handle kernel paging request in rb_erase

2020-06-26 Thread J. Bruce Fields
On Fri, Jun 26, 2020 at 12:32:42PM +0200, Dmitry Vyukov wrote:
> So far this crash happened only once:
> https://syzkaller.appspot.com/bug?extid=0e37e9d19bded16b8ab9
> 
> For continuous fuzzing on syzbot it usually means either (1) it's a
> super narrow race or (2) it's a previous unnoticed memory corruption.
> 
> Simpler bugs usually have much higher hit counts:
> https://syzkaller.appspot.com/upstream
> https://syzkaller.appspot.com/upstream/fixed
> 
> If you did a reasonable looking for any obvious bugs in the code that
> would lead to such failure, it can make sense to postpone any
> additional actions until we have more info.
> If no info comes, at some point syzbot will auto-obsolete it, and then
> then we can assume it was (2).

OK, thanks.

It's a big heavily used data structure, if there was random memory
corruption then I guess this wouldn't be a surprising way for it to show
up.

--b.


Re: Re: [PATCH] nfsd: fix kernel crash when load nfsd in docker

2020-06-26 Thread J. Bruce Fields
On Fri, Jun 26, 2020 at 08:45:23PM +0800, Luo Xiaogang wrote:
> At 2020-06-24 09:29:01, "J. Bruce Fields"  wrote:
> >On Mon, Jun 15, 2020 at 03:12:11PM +0800, Luo Xiaogang wrote:
> >> We load nfsd module in the docker container, kernel crash as following.
> >> 
> >> The 'current->nsproxy->net_ns->gen->ptr[nfsd_net_id]' is overflow in the
> >> nfsd_init_net.
> >> 
> >> We should use the net_ns which is being init in the nfsd_init_net,
> >> not the 'current->nsproxy->net_ns'.
> >
> >Thanks!  Actually, I think my problem was that net init and exit are
> >just the wrong place to be doing this--I moved them to nfsd start/stop
> >instead.
> >
> >And then that exposed the fact that I had an inode leak.
> >
> >Do the following two patches help?
> 
> Just test it on Ubuntu 18.04 + Docker 19.03.6, and the docker image is 
> ubuntu:18.04.
> 
> Your patchset helps, here is my reported-and-tested-by, Thanks very much.
> 
> Reported-and-Tested-by:  Luo Xiaogang 

Thank you!

--b.

> 
> 
> >--b.
> >
> >From 16f954bd5c481596a63271a91963bf260e2f3f46 Mon Sep 17 00:00:00 2001
> >From: "J. Bruce Fields" 
> >Date: Tue, 23 Jun 2020 16:00:33 -0400
> >Subject: [PATCH 1/2] nfsd4: fix nfsdfs reference count loop
> >
> >We don't drop the reference on the nfsdfs filesystem with
> >mntput(nn->nfsd_mnt) until nfsd_exit_net(), but that won't be called
> >until the nfsd module's unloaded, and we can't unload the module as long
> >as there's a reference on nfsdfs.  So this prevents module unloading.
> >
> >Signed-off-by: J. Bruce Fields 
> >---
> > fs/nfsd/nfs4state.c |  8 +++-
> > fs/nfsd/nfsctl.c| 22 --
> > fs/nfsd/nfsd.h  |  3 +++
> > 3 files changed, 22 insertions(+), 11 deletions(-)
> >
> >diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> >index bb3d2c32664a..cce2510b2cca 100644
> >--- a/fs/nfsd/nfs4state.c
> >+++ b/fs/nfsd/nfs4state.c
> >@@ -7912,9 +7912,14 @@ nfs4_state_start_net(struct net *net)
> > struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> > int ret;
> > 
> >-ret = nfs4_state_create_net(net);
> >+ret = get_nfsdfs(net);
> > if (ret)
> > return ret;
> >+ret = nfs4_state_create_net(net);
> >+if (ret) {
> >+mntput(nn->nfsd_mnt);
> >+return ret;
> >+}
> > locks_start_grace(net, >nfsd4_manager);
> > nfsd4_client_tracking_init(net);
> > if (nn->track_reclaim_completes && nn->reclaim_str_hashtbl_size == 0)
> >@@ -7984,6 +7989,7 @@ nfs4_state_shutdown_net(struct net *net)
> > 
> > nfsd4_client_tracking_exit(net);
> > nfs4_state_destroy_net(net);
> >+mntput(nn->nfsd_mnt);
> > }
> > 
> > void
> >diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> >index b68e96681522..cf98a81ca1ea 100644
> >--- a/fs/nfsd/nfsctl.c
> >+++ b/fs/nfsd/nfsctl.c
> >@@ -1424,6 +1424,18 @@ static struct file_system_type nfsd_fs_type = {
> > };
> > MODULE_ALIAS_FS("nfsd");
> > 
> >+int get_nfsdfs(struct net *net)
> >+{
> >+struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> >+struct vfsmount *mnt;
> >+
> >+mnt =  vfs_kern_mount(_fs_type, SB_KERNMOUNT, "nfsd", NULL);
> >+if (IS_ERR(mnt))
> >+return PTR_ERR(mnt);
> >+nn->nfsd_mnt = mnt;
> >+return 0;
> >+}
> >+
> > #ifdef CONFIG_PROC_FS
> > static int create_proc_exports_entry(void)
> > {
> >@@ -1451,7 +1463,6 @@ unsigned int nfsd_net_id;
> > static __net_init int nfsd_init_net(struct net *net)
> > {
> > int retval;
> >-struct vfsmount *mnt;
> > struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> > 
> > retval = nfsd_export_init(net);
> >@@ -1478,16 +1489,8 @@ static __net_init int nfsd_init_net(struct net *net)
> > init_waitqueue_head(>ntf_wq);
> > seqlock_init(>boot_lock);
> > 
> >-mnt =  vfs_kern_mount(_fs_type, SB_KERNMOUNT, "nfsd", NULL);
> >-if (IS_ERR(mnt)) {
> >-retval = PTR_ERR(mnt);
> >-goto out_mount_err;
> >-}
> >-nn->nfsd_mnt = mnt;
> > return 0;
> > 
> >-out_mount_err:
> >-nfsd_reply_cache_shutdown(nn);
> > out_drc_error:
> > nfsd_idmap_shutdown(net);
> > out_idmap_error:
> >@@ -1500,7 +1503,6 @@ static __net_exit void nfsd_exit_

Re: BUG: unable to handle kernel paging request in rb_erase

2020-06-25 Thread J. Bruce Fields
On Thu, Jun 04, 2020 at 05:58:12PM -0400, J. Bruce Fields wrote:
> On Thu, Jun 04, 2020 at 11:53:59AM +0800, Hillf Danton wrote:
> > 
> > On Wed, 3 Jun 2020 12:48:49 -0400 J. Bruce Fields wrote:
> > > On Wed, Jun 03, 2020 at 10:43:26AM -0400, J. Bruce Fields wrote:
> > > > On Wed, Jun 03, 2020 at 12:34:35PM +0800, Hillf Danton wrote:
> > > > > 
> > > > > On Tue, 2 Jun 2020 17:55:17 -0400 "J. Bruce Fields" wrote:
> > > > > > 
> > > > > > As far as I know, this one's still unresolved.  I can't see the bug 
> > > > > > from
> > > > > > code inspection, and we don't have a reproducer.  If anyone else 
> > > > > > sees
> > > > > > this or has an idea what might be going wrong, I'd be 
> > > > > > interested.--b.
> > > > > 
> > > > > It's a PF reported in the syz-executor.3 context (PID: 8682 on CPU:1),
> > > > > meanwhile there's another at 
> > > > > 
> > > > >  https://lore.kernel.org/lkml/20200603011425.ga13...@fieldses.org/T/#t
> > > > >  Reported-by: syzbot+a29df412692980277...@syzkaller.appspotmail.com
> > > > > 
> > > > > in the kworker context, and one of the quick questions is, is it 
> > > > > needed
> > > > > to serialize the two players, say, using a mutex?
> > > > 
> > > > nfsd_reply_cache_shutdown() doesn't take any locks.  All the data
> > > > structures it's tearing down are per-network-namespace, and it's assumed
> > > > all the users of that structure are gone by the time we get here.
> > > > 
> > > > I wonder if that assumption's correct.  Looking at nfsd_exit_net()
> > 
> > IIUC it's correct for the kworker case where the ns in question is on
> > the cleanup list, and for the syscall as well because the report is
> > triggered in the error path, IOW the new ns is not yet visible to the
> > kworker ATM.
> 
> Sorry, I'm not familiar with the namespace code and I'm not following
> you.
> 
> I'm trying to figure out what prevents the network namespace exit method
> being called while nfsd is still processing an rpc call from that
> network namespace.

Looking at this some more:

Each server socket (struct svc_xprt) holds a reference on the struct net
that's not released until svc_xprt_free().

The svc_xprt is itself referenced as long as an rpc for that socket is
being processed, the referenced released in svc_xprt_release().  Which
isn't called until the rpc is processed and the reply sent.

So, assuming nfsd_exit_net() can't be called while we're still holding
references to the struct net, there can't still be any reply cache
processing going on when nfsd_reply_cache_shutdown() is called.

So, still a mystery to me how this is happening.

--b.

> 
> 
> > ---
> > v2: Use linux/highmem.h instead of asm/cacheflush.sh
> > Signed-off-by: Christophe Leroy 
> > ---
> > net/sunrpc/svcsock.c | 1 +
> > 1 file changed, 1 insertion(+)
> > 
> > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> > index 5c4ec9386f81..c537272f9c7e 100644
> > --- a/net/sunrpc/svcsock.c
> > +++ b/net/sunrpc/svcsock.c
> > @@ -44,6 +44,7 @@
> > #include 
> > #include 
> > #include 
> > +#include 
> > #include 
> > 
> > #include 
> > -- 
> > 2.25.0
> > 
> 
> --
> Chuck Lever
> 
> 

> 
> --b.
> 
> > 
> > Then we can not draw a race between the two parties, and the two reports
> > are not related... but of independent glitches.
> > 
> > > > 
> > > > nfsd_reply_cache_shutdown() is one of the first things we do, so I think
> > > > we're depending on the assumption that the interfaces in that network
> > > > namespace, and anything referencing associated sockets (in particular,
> > > > any associated in-progress rpc's), must be gone before our net exit
> > > > method is called.
> > > > 
> > > > I wonder if that's a good assumption.
> > > 
> > > I think that assumption must be the problem.
> > > 
> > > That would explain why the crashes are happening in nfsd_exit_net as
> > > opposed to somewhere else, and why we're only seeing them since
> > > 3ba75830ce17 "nfsd4: drc containerization".
> > > 
> > > I wonder what *is* safe to assume when the net exit method is called?
> > > 
> > > --b.
> > > 
> > > > 
> > > > --b.
> > > >

Re: [PATCH] nfsd: fix kernel crash when load nfsd in docker

2020-06-23 Thread J. Bruce Fields
On Mon, Jun 15, 2020 at 03:12:11PM +0800, Luo Xiaogang wrote:
> We load nfsd module in the docker container, kernel crash as following.
> 
> The 'current->nsproxy->net_ns->gen->ptr[nfsd_net_id]' is overflow in the
> nfsd_init_net.
> 
> We should use the net_ns which is being init in the nfsd_init_net,
> not the 'current->nsproxy->net_ns'.

Thanks!  Actually, I think my problem was that net init and exit are
just the wrong place to be doing this--I moved them to nfsd start/stop
instead.

And then that exposed the fact that I had an inode leak.

Do the following two patches help?

--b.

>From 16f954bd5c481596a63271a91963bf260e2f3f46 Mon Sep 17 00:00:00 2001
From: "J. Bruce Fields" 
Date: Tue, 23 Jun 2020 16:00:33 -0400
Subject: [PATCH 1/2] nfsd4: fix nfsdfs reference count loop

We don't drop the reference on the nfsdfs filesystem with
mntput(nn->nfsd_mnt) until nfsd_exit_net(), but that won't be called
until the nfsd module's unloaded, and we can't unload the module as long
as there's a reference on nfsdfs.  So this prevents module unloading.

Signed-off-by: J. Bruce Fields 
---
 fs/nfsd/nfs4state.c |  8 +++-
 fs/nfsd/nfsctl.c| 22 --
 fs/nfsd/nfsd.h  |  3 +++
 3 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index bb3d2c32664a..cce2510b2cca 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -7912,9 +7912,14 @@ nfs4_state_start_net(struct net *net)
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
int ret;
 
-   ret = nfs4_state_create_net(net);
+   ret = get_nfsdfs(net);
if (ret)
return ret;
+   ret = nfs4_state_create_net(net);
+   if (ret) {
+   mntput(nn->nfsd_mnt);
+   return ret;
+   }
locks_start_grace(net, >nfsd4_manager);
nfsd4_client_tracking_init(net);
if (nn->track_reclaim_completes && nn->reclaim_str_hashtbl_size == 0)
@@ -7984,6 +7989,7 @@ nfs4_state_shutdown_net(struct net *net)
 
nfsd4_client_tracking_exit(net);
nfs4_state_destroy_net(net);
+   mntput(nn->nfsd_mnt);
 }
 
 void
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index b68e96681522..cf98a81ca1ea 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -1424,6 +1424,18 @@ static struct file_system_type nfsd_fs_type = {
 };
 MODULE_ALIAS_FS("nfsd");
 
+int get_nfsdfs(struct net *net)
+{
+   struct nfsd_net *nn = net_generic(net, nfsd_net_id);
+   struct vfsmount *mnt;
+
+   mnt =  vfs_kern_mount(_fs_type, SB_KERNMOUNT, "nfsd", NULL);
+   if (IS_ERR(mnt))
+   return PTR_ERR(mnt);
+   nn->nfsd_mnt = mnt;
+   return 0;
+}
+
 #ifdef CONFIG_PROC_FS
 static int create_proc_exports_entry(void)
 {
@@ -1451,7 +1463,6 @@ unsigned int nfsd_net_id;
 static __net_init int nfsd_init_net(struct net *net)
 {
int retval;
-   struct vfsmount *mnt;
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
 
retval = nfsd_export_init(net);
@@ -1478,16 +1489,8 @@ static __net_init int nfsd_init_net(struct net *net)
init_waitqueue_head(>ntf_wq);
seqlock_init(>boot_lock);
 
-   mnt =  vfs_kern_mount(_fs_type, SB_KERNMOUNT, "nfsd", NULL);
-   if (IS_ERR(mnt)) {
-   retval = PTR_ERR(mnt);
-   goto out_mount_err;
-   }
-   nn->nfsd_mnt = mnt;
return 0;
 
-out_mount_err:
-   nfsd_reply_cache_shutdown(nn);
 out_drc_error:
nfsd_idmap_shutdown(net);
 out_idmap_error:
@@ -1500,7 +1503,6 @@ static __net_exit void nfsd_exit_net(struct net *net)
 {
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
 
-   mntput(nn->nfsd_mnt);
nfsd_reply_cache_shutdown(nn);
nfsd_idmap_shutdown(net);
nfsd_export_shutdown(net);
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index 36cdd81b6688..57c832d1b30f 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -90,6 +90,8 @@ void  nfsd_destroy(struct net *net);
 
 bool   i_am_nfsd(void);
 
+int get_nfsdfs(struct net *);
+
 struct nfsdfs_client {
struct kref cl_ref;
void (*cl_release)(struct kref *kref);
@@ -100,6 +102,7 @@ struct dentry *nfsd_client_mkdir(struct nfsd_net *nn,
struct nfsdfs_client *ncl, u32 id, const struct tree_descr *);
 void nfsd_client_rmdir(struct dentry *dentry);
 
+
 #if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
 #ifdef CONFIG_NFSD_V2_ACL
 extern const struct svc_version nfsd_acl_version2;
-- 
2.26.2


>From 51de3b460b39e862f7dcfd4d600e8de0afe73e29 Mon Sep 17 00:00:00 2001
From: "J. Bruce Fields" 
Date: Tue, 23 Jun 2020 21:01:19 -0400
Subject: [PATCH 2/2] nfsd: fix nfsdfs inode reference count leak

I don't understand this code well, but  I'm seeing a warning about a
still-referenced inode on unmount, and every other s

Re: [PATCH -next] sunrpc: use kmemdup_nul() in gssp_stringify()

2020-06-08 Thread J. Bruce Fields
Thanks, applied.--b.

On Fri, May 08, 2020 at 08:40:00PM +0800, Chen Zhou wrote:
> It is more efficient to use kmemdup_nul() if the size is known exactly
> .
> 
> According to doc:
> "Note: Use kmemdup_nul() instead if the size is known exactly."
> 
> Signed-off-by: Chen Zhou 
> ---
>  net/sunrpc/auth_gss/gss_rpc_upcall.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/sunrpc/auth_gss/gss_rpc_upcall.c 
> b/net/sunrpc/auth_gss/gss_rpc_upcall.c
> index 0349f455a862..af9c7f43859c 100644
> --- a/net/sunrpc/auth_gss/gss_rpc_upcall.c
> +++ b/net/sunrpc/auth_gss/gss_rpc_upcall.c
> @@ -223,7 +223,7 @@ static int gssp_alloc_receive_pages(struct 
> gssx_arg_accept_sec_context *arg)
>  
>  static char *gssp_stringify(struct xdr_netobj *netobj)
>  {
> - return kstrndup(netobj->data, netobj->len, GFP_KERNEL);
> + return kmemdup_nul(netobj->data, netobj->len, GFP_KERNEL);
>  }
>  
>  static void gssp_hostbased_service(char **principal)
> -- 
> 2.20.1


Re: BUG: unable to handle kernel paging request in rb_erase

2020-06-04 Thread J. Bruce Fields
On Thu, Jun 04, 2020 at 11:53:59AM +0800, Hillf Danton wrote:
> 
> On Wed, 3 Jun 2020 12:48:49 -0400 J. Bruce Fields wrote:
> > On Wed, Jun 03, 2020 at 10:43:26AM -0400, J. Bruce Fields wrote:
> > > On Wed, Jun 03, 2020 at 12:34:35PM +0800, Hillf Danton wrote:
> > > > 
> > > > On Tue, 2 Jun 2020 17:55:17 -0400 "J. Bruce Fields" wrote:
> > > > > 
> > > > > As far as I know, this one's still unresolved.  I can't see the bug 
> > > > > from
> > > > > code inspection, and we don't have a reproducer.  If anyone else sees
> > > > > this or has an idea what might be going wrong, I'd be interested.--b.
> > > > 
> > > > It's a PF reported in the syz-executor.3 context (PID: 8682 on CPU:1),
> > > > meanwhile there's another at 
> > > > 
> > > >  https://lore.kernel.org/lkml/20200603011425.ga13...@fieldses.org/T/#t
> > > >  Reported-by: syzbot+a29df412692980277...@syzkaller.appspotmail.com
> > > > 
> > > > in the kworker context, and one of the quick questions is, is it needed
> > > > to serialize the two players, say, using a mutex?
> > > 
> > > nfsd_reply_cache_shutdown() doesn't take any locks.  All the data
> > > structures it's tearing down are per-network-namespace, and it's assumed
> > > all the users of that structure are gone by the time we get here.
> > > 
> > > I wonder if that assumption's correct.  Looking at nfsd_exit_net()
> 
> IIUC it's correct for the kworker case where the ns in question is on
> the cleanup list, and for the syscall as well because the report is
> triggered in the error path, IOW the new ns is not yet visible to the
> kworker ATM.

Sorry, I'm not familiar with the namespace code and I'm not following
you.

I'm trying to figure out what prevents the network namespace exit method
being called while nfsd is still processing an rpc call from that
network namespace.

--b.

> 
> Then we can not draw a race between the two parties, and the two reports
> are not related... but of independent glitches.
> 
> > > 
> > > nfsd_reply_cache_shutdown() is one of the first things we do, so I think
> > > we're depending on the assumption that the interfaces in that network
> > > namespace, and anything referencing associated sockets (in particular,
> > > any associated in-progress rpc's), must be gone before our net exit
> > > method is called.
> > > 
> > > I wonder if that's a good assumption.
> > 
> > I think that assumption must be the problem.
> > 
> > That would explain why the crashes are happening in nfsd_exit_net as
> > opposed to somewhere else, and why we're only seeing them since
> > 3ba75830ce17 "nfsd4: drc containerization".
> > 
> > I wonder what *is* safe to assume when the net exit method is called?
> > 
> > --b.
> > 
> > > 
> > > --b.
> > > 
> > > > 
> > > > 
> > > > > On Sun, May 17, 2020 at 11:59:12PM -0700, syzbot wrote:
> > > > > > Hello,
> > > > > > 
> > > > > > syzbot found the following crash on:
> > > > > > 
> > > > > > HEAD commit:9b1f2cbd Merge tag 'clk-fixes-for-linus' of 
> > > > > > git://git.kern..
> > > > > > git tree:   upstream
> > > > > > console output: 
> > > > > > https://syzkaller.appspot.com/x/log.txt?x=15dfdeaa10
> > > > > > kernel config:  
> > > > > > https://syzkaller.appspot.com/x/.config?x=c14212794ed9ad24
> > > > > > dashboard link: 
> > > > > > https://syzkaller.appspot.com/bug?extid=0e37e9d19bded16b8ab9
> > > > > > compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> > > > > > 
> > > > > > Unfortunately, I don't have any reproducer for this crash yet.
> > > > > > 
> > > > > > IMPORTANT: if you fix the bug, please add the following tag to the 
> > > > > > commit:
> > > > > > Reported-by: syzbot+0e37e9d19bded16b8...@syzkaller.appspotmail.com
> > > > > > 
> > > > > > BUG: unable to handle page fault for address: 8870
> > > > > > #PF: supervisor read access in kernel mode
> > > > > > #PF: error_code(0x) - not-present page
> > > > > > PGD 0 P4D 0 
> > > > > > Oops:  [#1] PREEMPT SMP KASAN
> > > > > > CPU: 1 PI

Re: BUG: unable to handle kernel paging request in rb_erase

2020-06-03 Thread J. Bruce Fields
On Wed, Jun 03, 2020 at 10:43:26AM -0400, J. Bruce Fields wrote:
> On Wed, Jun 03, 2020 at 12:34:35PM +0800, Hillf Danton wrote:
> > 
> > On Tue, 2 Jun 2020 17:55:17 -0400 "J. Bruce Fields" wrote:
> > > 
> > > As far as I know, this one's still unresolved.  I can't see the bug from
> > > code inspection, and we don't have a reproducer.  If anyone else sees
> > > this or has an idea what might be going wrong, I'd be interested.--b.
> > 
> > It's a PF reported in the syz-executor.3 context (PID: 8682 on CPU:1),
> > meanwhile there's another at 
> > 
> >  https://lore.kernel.org/lkml/20200603011425.ga13...@fieldses.org/T/#t
> >  Reported-by: syzbot+a29df412692980277...@syzkaller.appspotmail.com
> > 
> > in the kworker context, and one of the quick questions is, is it needed
> > to serialize the two players, say, using a mutex?
> 
> nfsd_reply_cache_shutdown() doesn't take any locks.  All the data
> structures it's tearing down are per-network-namespace, and it's assumed
> all the users of that structure are gone by the time we get here.
> 
> I wonder if that assumption's correct.  Looking at nfsd_exit_net()
> 
> nfsd_reply_cache_shutdown() is one of the first things we do, so I think
> we're depending on the assumption that the interfaces in that network
> namespace, and anything referencing associated sockets (in particular,
> any associated in-progress rpc's), must be gone before our net exit
> method is called.
> 
> I wonder if that's a good assumption.

I think that assumption must be the problem.

That would explain why the crashes are happening in nfsd_exit_net as
opposed to somewhere else, and why we're only seeing them since
3ba75830ce17 "nfsd4: drc containerization".

I wonder what *is* safe to assume when the net exit method is called?

--b.

> 
> --b.
> 
> > 
> > 
> > > On Sun, May 17, 2020 at 11:59:12PM -0700, syzbot wrote:
> > > > Hello,
> > > > 
> > > > syzbot found the following crash on:
> > > > 
> > > > HEAD commit:9b1f2cbd Merge tag 'clk-fixes-for-linus' of 
> > > > git://git.kern..
> > > > git tree:   upstream
> > > > console output: https://syzkaller.appspot.com/x/log.txt?x=15dfdeaa10
> > > > kernel config:  
> > > > https://syzkaller.appspot.com/x/.config?x=c14212794ed9ad24
> > > > dashboard link: 
> > > > https://syzkaller.appspot.com/bug?extid=0e37e9d19bded16b8ab9
> > > > compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> > > > 
> > > > Unfortunately, I don't have any reproducer for this crash yet.
> > > > 
> > > > IMPORTANT: if you fix the bug, please add the following tag to the 
> > > > commit:
> > > > Reported-by: syzbot+0e37e9d19bded16b8...@syzkaller.appspotmail.com
> > > > 
> > > > BUG: unable to handle page fault for address: 8870
> > > > #PF: supervisor read access in kernel mode
> > > > #PF: error_code(0x) - not-present page
> > > > PGD 0 P4D 0 
> > > > Oops:  [#1] PREEMPT SMP KASAN
> > > > CPU: 1 PID: 8682 Comm: syz-executor.3 Not tainted 5.7.0-rc5-syzkaller #0
> > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> > > > Google 01/01/2011
> > > > RIP: 0010:__rb_erase_augmented include/linux/rbtree_augmented.h:201 
> > > > [inline]
> > > > RIP: 0010:rb_erase+0x37/0x18d0 lib/rbtree.c:443
> > > > Code: 89 f7 41 56 41 55 49 89 fd 48 83 c7 08 48 89 fa 41 54 48 c1 ea 03 
> > > > 55 53 48 83 ec 18 80 3c 02 00 0f 85 89 10 00 00 49 8d 7d 10 <4d> 8b 75 
> > > > 08 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 80
> > > > RSP: 0018:c900178ffb58 EFLAGS: 00010246
> > > > RAX: dc00 RBX: 8880354d RCX: c9000fb6d000
> > > > RDX: 110e RSI: 88800011dfe0 RDI: 8878
> > > > RBP: 887fffb0 R08: 888057284280 R09: fbfff185d12e
> > > > R10: 8c2e896f R11: fbfff185d12d R12: 88800011dfe0
> > > > R13: 887fffe8 R14: 0001dfe0 R15: 88800011dfe0
> > > > FS:  7fa002d21700() GS:8880ae70() 
> > > > knlGS:
> > > > CS:  0010 DS:  ES:  CR0: 80050033
> > > > CR2: 8870 CR3: a2164000 CR4: 001426e0
> > > > DR0:  DR1:  DR2: 
> > > > DR3:  DR6: fffe0ff

Re: BUG: unable to handle kernel paging request in rb_erase

2020-06-03 Thread J. Bruce Fields
On Wed, Jun 03, 2020 at 12:34:35PM +0800, Hillf Danton wrote:
> 
> On Tue, 2 Jun 2020 17:55:17 -0400 "J. Bruce Fields" wrote:
> > 
> > As far as I know, this one's still unresolved.  I can't see the bug from
> > code inspection, and we don't have a reproducer.  If anyone else sees
> > this or has an idea what might be going wrong, I'd be interested.--b.
> 
> It's a PF reported in the syz-executor.3 context (PID: 8682 on CPU:1),
> meanwhile there's another at 
> 
>  https://lore.kernel.org/lkml/20200603011425.ga13...@fieldses.org/T/#t
>  Reported-by: syzbot+a29df412692980277...@syzkaller.appspotmail.com
> 
> in the kworker context, and one of the quick questions is, is it needed
> to serialize the two players, say, using a mutex?

nfsd_reply_cache_shutdown() doesn't take any locks.  All the data
structures it's tearing down are per-network-namespace, and it's assumed
all the users of that structure are gone by the time we get here.

I wonder if that assumption's correct.  Looking at nfsd_exit_net()

nfsd_reply_cache_shutdown() is one of the first things we do, so I think
we're depending on the assumption that the interfaces in that network
namespace, and anything referencing associated sockets (in particular,
any associated in-progress rpc's), must be gone before our net exit
method is called.

I wonder if that's a good assumption.

--b.

> 
> 
> > On Sun, May 17, 2020 at 11:59:12PM -0700, syzbot wrote:
> > > Hello,
> > > 
> > > syzbot found the following crash on:
> > > 
> > > HEAD commit:9b1f2cbd Merge tag 'clk-fixes-for-linus' of 
> > > git://git.kern..
> > > git tree:   upstream
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=15dfdeaa10
> > > kernel config:  https://syzkaller.appspot.com/x/.config?x=c14212794ed9ad24
> > > dashboard link: 
> > > https://syzkaller.appspot.com/bug?extid=0e37e9d19bded16b8ab9
> > > compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> > > 
> > > Unfortunately, I don't have any reproducer for this crash yet.
> > > 
> > > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > > Reported-by: syzbot+0e37e9d19bded16b8...@syzkaller.appspotmail.com
> > > 
> > > BUG: unable to handle page fault for address: 8870
> > > #PF: supervisor read access in kernel mode
> > > #PF: error_code(0x) - not-present page
> > > PGD 0 P4D 0 
> > > Oops:  [#1] PREEMPT SMP KASAN
> > > CPU: 1 PID: 8682 Comm: syz-executor.3 Not tainted 5.7.0-rc5-syzkaller #0
> > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> > > Google 01/01/2011
> > > RIP: 0010:__rb_erase_augmented include/linux/rbtree_augmented.h:201 
> > > [inline]
> > > RIP: 0010:rb_erase+0x37/0x18d0 lib/rbtree.c:443
> > > Code: 89 f7 41 56 41 55 49 89 fd 48 83 c7 08 48 89 fa 41 54 48 c1 ea 03 
> > > 55 53 48 83 ec 18 80 3c 02 00 0f 85 89 10 00 00 49 8d 7d 10 <4d> 8b 75 08 
> > > 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 80
> > > RSP: 0018:c900178ffb58 EFLAGS: 00010246
> > > RAX: dc00 RBX: 8880354d RCX: c9000fb6d000
> > > RDX: 110e RSI: 88800011dfe0 RDI: 8878
> > > RBP: 887fffb0 R08: 888057284280 R09: fbfff185d12e
> > > R10: 8c2e896f R11: fbfff185d12d R12: 88800011dfe0
> > > R13: 887fffe8 R14: 0001dfe0 R15: 88800011dfe0
> > > FS:  7fa002d21700() GS:8880ae70() 
> > > knlGS:
> > > CS:  0010 DS:  ES:  CR0: 80050033
> > > CR2: 8870 CR3: a2164000 CR4: 001426e0
> > > DR0:  DR1:  DR2: 
> > > DR3:  DR6: fffe0ff0 DR7: 0400
> > > Call Trace:
> > >  nfsd_reply_cache_free_locked+0x198/0x380 fs/nfsd/nfscache.c:127
> > >  nfsd_reply_cache_shutdown+0x150/0x350 fs/nfsd/nfscache.c:203
> > >  nfsd_exit_net+0x189/0x4c0 fs/nfsd/nfsctl.c:1504
> > >  ops_exit_list.isra.0+0xa8/0x150 net/core/net_namespace.c:186
> > >  setup_net+0x50c/0x860 net/core/net_namespace.c:364
> > >  copy_net_ns+0x293/0x590 net/core/net_namespace.c:482
> > >  create_new_namespaces+0x3fb/0xb30 kernel/nsproxy.c:108
> > >  unshare_nsproxy_namespaces+0xbd/0x1f0 kernel/nsproxy.c:229
> > >  ksys_unshare+0x43d/0x8e0 kernel/fork.c:2970
> > >  __do_sys_unshare kernel/fork.c:3038 [inline]
> > >  __se_sys_unshare kernel/fork.c:3036 [inline]
> > >  __x64_sys_unshare+0x2d/0x40 kernel/fork.c:3036
> > >  do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
> > >  entry_SYSCALL_64_after_hwframe+0x49/0xb3


Re: general protection fault in nfsd_reply_cache_free_locked

2020-06-02 Thread J. Bruce Fields
On Mon, May 11, 2020 at 11:55:16PM -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:

This is like


https://lore.kernel.org/linux-nfs/5016dd05a5e6b...@google.com/

in that we're discovering the drc is corrupt while destroying it.

I don't see the problem yet.

--b.

> 
> HEAD commit:6e7f2eac Merge tag 'arm64-fixes' of git://git.kernel.org/p..
> git tree:   upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1456703410
> kernel config:  https://syzkaller.appspot.com/x/.config?x=b0212dbee046bc1f
> dashboard link: https://syzkaller.appspot.com/bug?extid=a29df412692980277f9d
> compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> 
> Unfortunately, I don't have any reproducer for this crash yet.
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+a29df412692980277...@syzkaller.appspotmail.com
> 
> general protection fault, probably for non-canonical address 
> 0xdc02:  [#1] PREEMPT SMP KASAN
> KASAN: null-ptr-deref in range [0x0010-0x0017]
> CPU: 0 PID: 27932 Comm: kworker/u4:4 Not tainted 5.7.0-rc4-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 01/01/2011
> Workqueue: netns cleanup_net
> RIP: 0010:nfsd_reply_cache_free_locked+0x2d/0x380 fs/nfsd/nfscache.c:122
> Code: 56 41 55 41 54 49 89 fc 55 48 89 f5 53 48 89 d3 e8 08 c0 2f ff 48 8d 7d 
> 61 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 48 89 fa 
> 83 e2 07 38 d0 7f 08 84 c0 0f 85 a7 02 00 00
> RSP: 0018:c90008bb7b70 EFLAGS: 00010202
> RAX: dc00 RBX: 88826000 RCX: dc00
> RDX: 0002 RSI: 82436ea8 RDI: 0011
> RBP: ffb0 R08: 888093792400 R09: fbfff185cf3e
> R10: 8c2e79ef R11: fbfff185cf3d R12: 88800010
> R13: 88800018 R14:  R15: 88800010
> FS:  () GS:8880ae60() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 001b2c531000 CR3: 685a2000 CR4: 001426f0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
>  nfsd_reply_cache_shutdown+0x150/0x350 fs/nfsd/nfscache.c:203
>  nfsd_exit_net+0x189/0x4c0 fs/nfsd/nfsctl.c:1504
>  ops_exit_list.isra.0+0xa8/0x150 net/core/net_namespace.c:186
>  cleanup_net+0x511/0xa50 net/core/net_namespace.c:603
>  process_one_work+0x965/0x16a0 kernel/workqueue.c:2268
>  worker_thread+0x96/0xe20 kernel/workqueue.c:2414
>  kthread+0x388/0x470 kernel/kthread.c:268
>  ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
> Modules linked in:
> ---[ end trace 54f06072fc6a1afa ]---
> RIP: 0010:nfsd_reply_cache_free_locked+0x2d/0x380 fs/nfsd/nfscache.c:122
> Code: 56 41 55 41 54 49 89 fc 55 48 89 f5 53 48 89 d3 e8 08 c0 2f ff 48 8d 7d 
> 61 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 48 89 fa 
> 83 e2 07 38 d0 7f 08 84 c0 0f 85 a7 02 00 00
> RSP: 0018:c90008bb7b70 EFLAGS: 00010202
> RAX: dc00 RBX: 88826000 RCX: dc00
> RDX: 0002 RSI: 82436ea8 RDI: 0011
> RBP: ffb0 R08: 888093792400 R09: fbfff185cf3e
> R10: 8c2e79ef R11: fbfff185cf3d R12: 88800010
> R13: 88800018 R14:  R15: 88800010
> FS:  () GS:8880ae60() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 001b2c531000 CR3: 94cfb000 CR4: 001426f0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> 
> 
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkal...@googlegroups.com.
> 
> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.


Re: BUG: unable to handle kernel paging request in rb_erase

2020-06-02 Thread J. Bruce Fields
As far as I know, this one's still unresolved.  I can't see the bug from
code inspection, and we don't have a reproducer.  If anyone else sees
this or has an idea what might be going wrong, I'd be interested.--b.

On Sun, May 17, 2020 at 11:59:12PM -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:9b1f2cbd Merge tag 'clk-fixes-for-linus' of git://git.kern..
> git tree:   upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=15dfdeaa10
> kernel config:  https://syzkaller.appspot.com/x/.config?x=c14212794ed9ad24
> dashboard link: https://syzkaller.appspot.com/bug?extid=0e37e9d19bded16b8ab9
> compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> 
> Unfortunately, I don't have any reproducer for this crash yet.
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+0e37e9d19bded16b8...@syzkaller.appspotmail.com
> 
> BUG: unable to handle page fault for address: 8870
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x) - not-present page
> PGD 0 P4D 0 
> Oops:  [#1] PREEMPT SMP KASAN
> CPU: 1 PID: 8682 Comm: syz-executor.3 Not tainted 5.7.0-rc5-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 01/01/2011
> RIP: 0010:__rb_erase_augmented include/linux/rbtree_augmented.h:201 [inline]
> RIP: 0010:rb_erase+0x37/0x18d0 lib/rbtree.c:443
> Code: 89 f7 41 56 41 55 49 89 fd 48 83 c7 08 48 89 fa 41 54 48 c1 ea 03 55 53 
> 48 83 ec 18 80 3c 02 00 0f 85 89 10 00 00 49 8d 7d 10 <4d> 8b 75 08 48 b8 00 
> 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 80
> RSP: 0018:c900178ffb58 EFLAGS: 00010246
> RAX: dc00 RBX: 8880354d RCX: c9000fb6d000
> RDX: 110e RSI: 88800011dfe0 RDI: 8878
> RBP: 887fffb0 R08: 888057284280 R09: fbfff185d12e
> R10: 8c2e896f R11: fbfff185d12d R12: 88800011dfe0
> R13: 887fffe8 R14: 0001dfe0 R15: 88800011dfe0
> FS:  7fa002d21700() GS:8880ae70() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 8870 CR3: a2164000 CR4: 001426e0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
>  nfsd_reply_cache_free_locked+0x198/0x380 fs/nfsd/nfscache.c:127
>  nfsd_reply_cache_shutdown+0x150/0x350 fs/nfsd/nfscache.c:203
>  nfsd_exit_net+0x189/0x4c0 fs/nfsd/nfsctl.c:1504
>  ops_exit_list.isra.0+0xa8/0x150 net/core/net_namespace.c:186
>  setup_net+0x50c/0x860 net/core/net_namespace.c:364
>  copy_net_ns+0x293/0x590 net/core/net_namespace.c:482
>  create_new_namespaces+0x3fb/0xb30 kernel/nsproxy.c:108
>  unshare_nsproxy_namespaces+0xbd/0x1f0 kernel/nsproxy.c:229
>  ksys_unshare+0x43d/0x8e0 kernel/fork.c:2970
>  __do_sys_unshare kernel/fork.c:3038 [inline]
>  __se_sys_unshare kernel/fork.c:3036 [inline]
>  __x64_sys_unshare+0x2d/0x40 kernel/fork.c:3036
>  do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
>  entry_SYSCALL_64_after_hwframe+0x49/0xb3
> RIP: 0033:0x45ca29
> Code: 0d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 
> 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 
> 83 db b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:7fa002d20c78 EFLAGS: 0246 ORIG_RAX: 0110
> RAX: ffda RBX: 0050a1c0 RCX: 0045ca29
> RDX:  RSI:  RDI: 4000
> RBP: 0078bf00 R08:  R09: 
> R10:  R11: 0246 R12: 
> R13: 0c4e R14: 004ce9bd R15: 7fa002d216d4
> Modules linked in:
> CR2: 8870
> ---[ end trace f929dcba0362906a ]---
> RIP: 0010:__rb_erase_augmented include/linux/rbtree_augmented.h:201 [inline]
> RIP: 0010:rb_erase+0x37/0x18d0 lib/rbtree.c:443
> Code: 89 f7 41 56 41 55 49 89 fd 48 83 c7 08 48 89 fa 41 54 48 c1 ea 03 55 53 
> 48 83 ec 18 80 3c 02 00 0f 85 89 10 00 00 49 8d 7d 10 <4d> 8b 75 08 48 b8 00 
> 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 80
> RSP: 0018:c900178ffb58 EFLAGS: 00010246
> RAX: dc00 RBX: 8880354d RCX: c9000fb6d000
> RDX: 110e RSI: 88800011dfe0 RDI: 8878
> RBP: 887fffb0 R08: 888057284280 R09: fbfff185d12e
> R10: 8c2e896f R11: fbfff185d12d R12: 88800011dfe0
> R13: 887fffe8 R14: 0001dfe0 R15: 88800011dfe0
> FS:  7fa002d21700() GS:8880ae70() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 8870 CR3: a2164000 CR4: 001426e0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> 
> 
> ---
> This bug is generated by a bot. It may 

Re: 3ba75830ce ("nfsd4: drc containerization"): [ 51.013875] WARNING: possible circular locking dependency detected

2020-06-02 Thread J. Bruce Fields
On Wed, May 27, 2020 at 01:11:59PM +0800, kernel test robot wrote:
> Greetings,
> 
> 0day kernel testing robot got the below dmesg and the first bad commit is
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> 
> commit 3ba75830ce175550ef45c6524ec62faab8f62c1b

Thanks!  I looked at the code quickly and can't figure out where the
deadlock is exactly.

It seems to involve the kmem_cache_create() call in nfsd's net init
method.  Anyone know if there's a reason why creating a new slab cache
during network namespace initialization is a potential deadlock?

It was probably a dumb thing to do in this case--I can apply the below
to make the slab global again.  Kinda curious what exactly the bug is,
though.

--b.

commit 027690c75e8f
Author: J. Bruce Fields 
Date:   Mon Jun 1 17:44:45 2020 -0400

nfsd4: make drc_slab global, not per-net

I made every global per-network-namespace instead.  But perhaps doing
that to this slab was a step too far.

The kmem_cache_create call in our net init method also seems to be
responsible for this lockdep warning:

[   45.163710] Unable to find swap-space signature
[   45.375718] trinity-c1 (855): attempted to duplicate a private mapping 
with mremap.  This is not supported.
[   46.055744] futex_wake_op: trinity-c1 tries to shift op by -209; fix 
this program
[   51.011723]
[   51.013378] ==
[   51.013875] WARNING: possible circular locking dependency detected
[   51.014378] 5.2.0-rc2 #1 Not tainted
[   51.014672] --
[   51.015182] trinity-c2/886 is trying to acquire lock:
[   51.015593] 5405f099 (slab_mutex){+.+.}, at: 
slab_attr_store+0xa2/0x130
[   51.016190]
[   51.016190] but task is already holding lock:
[   51.016652] ac662005 (kn->count#43){}, at: 
kernfs_fop_write+0x286/0x500
[   51.017266]
[   51.017266] which lock already depends on the new lock.
[   51.017266]
[   51.017909]
[   51.017909] the existing dependency chain (in reverse order) is:
[   51.018497]
[   51.018497] -> #1 (kn->count#43){}:
[   51.018956]__lock_acquire+0x7cf/0x1a20
[   51.019317]lock_acquire+0x17d/0x390
[   51.019658]__kernfs_remove+0x892/0xae0
[   51.020020]kernfs_remove_by_name_ns+0x78/0x110
[   51.020435]sysfs_remove_link+0x55/0xb0
[   51.020832]sysfs_slab_add+0xc1/0x3e0
[   51.021332]__kmem_cache_create+0x155/0x200
[   51.021720]create_cache+0xf5/0x320
[   51.022054]kmem_cache_create_usercopy+0x179/0x320
[   51.022486]kmem_cache_create+0x1a/0x30
[   51.022867]nfsd_reply_cache_init+0x278/0x560
[   51.023266]nfsd_init_net+0x20f/0x5e0
[   51.023623]ops_init+0xcb/0x4b0
[   51.023928]setup_net+0x2fe/0x670
[   51.024315]copy_net_ns+0x30a/0x3f0
[   51.024653]create_new_namespaces+0x3c5/0x820
[   51.025257]unshare_nsproxy_namespaces+0xd1/0x240
[   51.025881]ksys_unshare+0x506/0x9c0
[   51.026381]__x64_sys_unshare+0x3a/0x50
[   51.026937]do_syscall_64+0x110/0x10b0
[   51.027509]entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   51.028175]
[   51.028175] -> #0 (slab_mutex){+.+.}:
[   51.028817]validate_chain+0x1c51/0x2cc0
[   51.029422]__lock_acquire+0x7cf/0x1a20
[   51.029947]lock_acquire+0x17d/0x390
[   51.030438]__mutex_lock+0x100/0xfa0
[   51.030995]mutex_lock_nested+0x27/0x30
[   51.031516]slab_attr_store+0xa2/0x130
[   51.032020]sysfs_kf_write+0x11d/0x180
[   51.032529]kernfs_fop_write+0x32a/0x500
[   51.033056]do_loop_readv_writev+0x21d/0x310
[   51.033627]do_iter_write+0x2e5/0x380
[   51.034148]vfs_writev+0x170/0x310
[   51.034616]do_pwritev+0x13e/0x160
[   51.035100]__x64_sys_pwritev+0xa3/0x110
[   51.035633]do_syscall_64+0x110/0x10b0
[   51.036200]entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   51.036924]
[   51.036924] other info that might help us debug this:
[   51.036924]
[   51.037876]  Possible unsafe locking scenario:
[   51.037876]
[   51.038556]CPU0CPU1
[   51.039130]
[   51.039676]   lock(kn->count#43);
[   51.040084]lock(slab_mutex);
[   51.040597]lock(kn->count#43);
[   51.041062]   lock(slab_mutex);
[   51.041320]
[   51.041320]  *** DEADLOCK ***
[   51.041320]
[   51.041793] 3 locks held by trinity-c2/886:
[   51.042128]  #0: 1f55e152 (sb_writer

Re: linux-next: build failure after merge of the nfsd tree

2020-05-11 Thread J. Bruce Fields
On Fri, May 08, 2020 at 10:47:20AM +1000, Stephen Rothwell wrote:
> After merging the nfsd tree, today's linux-next build (x86_64
> allmodconfig) failed like this:
> 
> kernel/trace/trace_hwlat.c:329:12: error: conflicting types for 'kthread_fn'
>   329 | static int kthread_fn(void *data)
>   |^~
> In file included from kernel/trace/trace_hwlat.c:40:
> include/linux/kthread.h:60:7: note: previous declaration of 'kthread_fn' was 
> here
>60 | void *kthread_fn(struct task_struct *k);
>   |   ^~
> 
> Caused by commit
> 
>   7df082e85764 ("kthread: save thread function")
> 
> I have used the nfsd tree from next-20200507 for today.

Whoops, I forgot to say thanks for this report.  I renamed the
kthread_fn in my patches to kthread_func.

--b.


Re: [PATCH 1/2] sunrpc: add missing newline when printing parameter 'pool_mode' by sysfs

2020-05-11 Thread J. Bruce Fields
On Fri, May 08, 2020 at 09:32:59AM +0800, Xiongfeng Wang wrote:
> When I cat parameter '/sys/module/sunrpc/parameters/pool_mode', it
> displays as follows. It is better to add a newline for easy reading.

Applying for 5.8.  I assume Trond's getting the other patch.

--b.

> 
> [root@hulk-202 ~]# cat /sys/module/sunrpc/parameters/pool_mode
> global[root@hulk-202 ~]#
> 
> Signed-off-by: Xiongfeng Wang 
> ---
>  net/sunrpc/svc.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> index 187dd4e..d8ef47f 100644
> --- a/net/sunrpc/svc.c
> +++ b/net/sunrpc/svc.c
> @@ -88,15 +88,15 @@ struct svc_pool_map svc_pool_map = {
>   switch (*ip)
>   {
>   case SVC_POOL_AUTO:
> - return strlcpy(buf, "auto", 20);
> + return strlcpy(buf, "auto\n", 20);
>   case SVC_POOL_GLOBAL:
> - return strlcpy(buf, "global", 20);
> + return strlcpy(buf, "global\n", 20);
>   case SVC_POOL_PERCPU:
> - return strlcpy(buf, "percpu", 20);
> + return strlcpy(buf, "percpu\n", 20);
>   case SVC_POOL_PERNODE:
> - return strlcpy(buf, "pernode", 20);
> + return strlcpy(buf, "pernode\n", 20);
>   default:
> - return sprintf(buf, "%d", *ip);
> + return sprintf(buf, "%d\n", *ip);
>   }
>  }
>  
> -- 
> 1.7.12.4


Re: [PATCH] nfsd: Fix old-style function definition

2020-05-11 Thread J. Bruce Fields
On Mon, May 11, 2020 at 08:07:08PM +0800, Ma Feng wrote:
> Fix warning:
> 
> fs/nfsd/nfssvc.c:604:6: warning: old-style function definition 
> [-Wold-style-definition]
>  bool i_am_nfsd()

Applying for 5.8, thanks.--b.

>   ^
> 
> Reported-by: Hulk Robot 
> Signed-off-by: Ma Feng 
> ---
>  fs/nfsd/nfssvc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index 4f588c0..b603dfc 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -601,7 +601,7 @@ static const struct svc_serv_ops nfsd_thread_sv_ops = {
>   .svo_module = THIS_MODULE,
>  };
> 
> -bool i_am_nfsd()
> +bool i_am_nfsd(void)
>  {
>   return kthread_func(current) == nfsd;
>  }
> --
> 2.6.2


Re: [PATCH net-next] sunrpc: Remove unused function ip_map_update

2020-05-06 Thread J. Bruce Fields
Thanks, applying for 5.8.--b.

On Tue, May 05, 2020 at 04:45:37PM +0800, YueHaibing wrote:
> commit 49b28684fdba ("nfsd: Remove deprecated nfsctl system call and related 
> code.")
> left behind this, remove it.
> 
> Signed-off-by: YueHaibing 
> ---
>  net/sunrpc/svcauth_unix.c | 9 -
>  1 file changed, 9 deletions(-)
> 
> diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c
> index 6c8f802c4261..97c0bddba7a3 100644
> --- a/net/sunrpc/svcauth_unix.c
> +++ b/net/sunrpc/svcauth_unix.c
> @@ -332,15 +332,6 @@ static int __ip_map_update(struct cache_detail *cd, 
> struct ip_map *ipm,
>   return 0;
>  }
>  
> -static inline int ip_map_update(struct net *net, struct ip_map *ipm,
> - struct unix_domain *udom, time64_t expiry)
> -{
> - struct sunrpc_net *sn;
> -
> - sn = net_generic(net, sunrpc_net_id);
> - return __ip_map_update(sn->ip_map_cache, ipm, udom, expiry);
> -}
> -
>  void svcauth_unix_purge(struct net *net)
>  {
>   struct sunrpc_net *sn;
> -- 
> 2.17.1
> 


Re: [PATCH 0/4] allow multiple kthreadd's

2020-05-06 Thread J. Bruce Fields
On Wed, May 06, 2020 at 11:39:20AM -0400, Tejun Heo wrote:
> Hello, Bruce.
> 
> On Wed, May 06, 2020 at 11:36:58AM -0400, J. Bruce Fields wrote:
> > On Tue, May 05, 2020 at 05:25:27PM -0400, J. Bruce Fields wrote:
> > > On Tue, May 05, 2020 at 05:09:56PM -0400, Tejun Heo wrote:
> > > > It's not the end of the world but a bit hacky. I wonder whether 
> > > > something
> > > > like the following would work better for identifying worker type so 
> > > > that you
> > > > can do sth like
> > > > 
> > > >  if (kthread_fn(current) == nfsd)
> > > > return kthread_data(current);
> > > >  else
> > > > return NULL; 
> > > 
> > > Yes, definitely more generic, looks good to me.
> > 
> > This is what I'm testing with.
> > 
> > If it's OK with you, could I add your Signed-off-by and take it through
> > the nfsd tree? I'll have some other patches that will depend on it.
> 
> Please feel free to use the code however you see fit. Given that it'll be
> originating from you, my signed-off-by might not be the right tag. Something
> like Original-patch-by should be good (nothing is fine too).

OK, I'll do that, thanks!

--b.


Re: [PATCH 0/4] allow multiple kthreadd's

2020-05-06 Thread J. Bruce Fields
On Tue, May 05, 2020 at 05:25:27PM -0400, J. Bruce Fields wrote:
> On Tue, May 05, 2020 at 05:09:56PM -0400, Tejun Heo wrote:
> > It's not the end of the world but a bit hacky. I wonder whether something
> > like the following would work better for identifying worker type so that you
> > can do sth like
> > 
> >  if (kthread_fn(current) == nfsd)
> > return kthread_data(current);
> >  else
> > return NULL; 
> 
> Yes, definitely more generic, looks good to me.

This is what I'm testing with.

If it's OK with you, could I add your Signed-off-by and take it through
the nfsd tree? I'll have some other patches that will depend on it.

--b.


commit 379bfe5257b6
Author: Tejun Heo 
Date:   Tue May 5 21:26:07 2020 -0400

kthread: save thread function

It's handy to keep the kthread_fn just as a unique cookie to identify
classes of kthreads.  E.g. if you can verify that a given task is
running your thread_fn, then you may know what sort of type kthread_data
points to.

We'll use this in nfsd to pass some information into the vfs.  Note it
will need kthread_data() exported too.

Signed-off-by: J. Bruce Fields 

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index 8bbcaad7ef0f..c00ee443833f 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -57,6 +57,7 @@ bool kthread_should_stop(void);
 bool kthread_should_park(void);
 bool __kthread_should_park(struct task_struct *k);
 bool kthread_freezable_should_stop(bool *was_frozen);
+void *kthread_fn(struct task_struct *k);
 void *kthread_data(struct task_struct *k);
 void *kthread_probe_data(struct task_struct *k);
 int kthread_park(struct task_struct *k);
diff --git a/kernel/kthread.c b/kernel/kthread.c
index bfbfa481be3a..b87c4a9ba91d 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -46,6 +46,7 @@ struct kthread_create_info
 struct kthread {
unsigned long flags;
unsigned int cpu;
+   int (*threadfn)(void *);
void *data;
struct completion parked;
struct completion exited;
@@ -152,6 +153,20 @@ bool kthread_freezable_should_stop(bool *was_frozen)
 }
 EXPORT_SYMBOL_GPL(kthread_freezable_should_stop);
 
+/**
+ * kthread_fn - return the function specified on kthread creation
+ * @task: kthread task in question
+ *
+ * Returns NULL if the task is not a kthread.
+ */
+void *kthread_fn(struct task_struct *task)
+{
+   if (task->flags & PF_KTHREAD)
+   return to_kthread(task)->threadfn;
+   return NULL;
+}
+EXPORT_SYMBOL_GPL(kthread_fn);
+
 /**
  * kthread_data - return data value specified on kthread creation
  * @task: kthread task in question
@@ -164,6 +179,7 @@ void *kthread_data(struct task_struct *task)
 {
return to_kthread(task)->data;
 }
+EXPORT_SYMBOL_GPL(kthread_data);
 
 /**
  * kthread_probe_data - speculative version of kthread_data()
@@ -244,6 +260,7 @@ static int kthread(void *_create)
do_exit(-ENOMEM);
}
 
+   self->threadfn = threadfn;
self->data = data;
init_completion(>exited);
init_completion(>parked);


Re: [PATCH 0/4] allow multiple kthreadd's

2020-05-05 Thread J. Bruce Fields
On Tue, May 05, 2020 at 05:09:56PM -0400, Tejun Heo wrote:
> Hello,
> 
> On Tue, May 05, 2020 at 05:01:18PM -0400, J. Bruce Fields wrote:
> > On Mon, May 04, 2020 at 10:15:14PM -0400, J. Bruce Fields wrote:
> > > Though now I'm feeling greedy: it would be nice to have both some kind
> > > of global flag, *and* keep kthread->data pointing to svc_rqst (as that
> > > would give me a simpler and quicker way to figure out which client is
> > > conflicting).  Could I take a flag bit in kthread->flags, maybe?
> > 
> > Would something like this be too hacky?:
> 
> It's not the end of the world but a bit hacky. I wonder whether something
> like the following would work better for identifying worker type so that you
> can do sth like
> 
>  if (kthread_fn(current) == nfsd)
> return kthread_data(current);
>  else
> return NULL; 

Yes, definitely more generic, looks good to me.

--b.

> 
> Thanks.
> 
> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index bfbfa481be3a..4f3ab9f2c994 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -46,6 +46,7 @@ struct kthread_create_info
>  struct kthread {
>   unsigned long flags;
>   unsigned int cpu;
> + int (*threadfn)(void *);
>   void *data;
>   struct completion parked;
>   struct completion exited;
> @@ -152,6 +153,13 @@ bool kthread_freezable_should_stop(bool *was_frozen)
>  }
>  EXPORT_SYMBOL_GPL(kthread_freezable_should_stop);
>  
> +void *kthread_fn(struct task_struct *task)
> +{
> + if (task->flags & PF_KTHREAD)
> + return to_kthread(task)->threadfn;
> + return NULL;
> +}
> +
>  /**
>   * kthread_data - return data value specified on kthread creation
>   * @task: kthread task in question
> @@ -244,6 +252,7 @@ static int kthread(void *_create)
>   do_exit(-ENOMEM);
>   }
>  
> + self->threadfn = threadfn;
>   self->data = data;
>   init_completion(>exited);
>   init_completion(>parked);
> 
> -- 
> tejun


Re: [PATCH 0/4] allow multiple kthreadd's

2020-05-05 Thread J. Bruce Fields
On Mon, May 04, 2020 at 10:15:14PM -0400, J. Bruce Fields wrote:
> Though now I'm feeling greedy: it would be nice to have both some kind
> of global flag, *and* keep kthread->data pointing to svc_rqst (as that
> would give me a simpler and quicker way to figure out which client is
> conflicting).  Could I take a flag bit in kthread->flags, maybe?

Would something like this be too hacky?:

--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -58,6 +58,7 @@ enum KTHREAD_BITS {
KTHREAD_IS_PER_CPU = 0,
KTHREAD_SHOULD_STOP,
KTHREAD_SHOULD_PARK,
+   KTHREAD_IS_NFSD,
 };
 
 static inline void set_kthread_struct(void *kthread)
@@ -164,6 +165,25 @@ void *kthread_data(struct task_struct *task)
return to_kthread(task)->data;
 }
 
+void kthread_set_nfsd()
+{
+   set_bit(KTHREAD_IS_NFSD, _kthread(current)->flags);
+}
+EXPORT_SYMBOL_GPL(kthread_set_nfsd);
+
+void *kthread_nfsd_data()
+{
+   struct kthread *k;
+
+   if (!(current->flags & PF_KTHREAD))
+   return NULL;
+   k = to_kthread(current);
+   if (test_bit(KTHREAD_IS_NFSD, >flags))
+   return k->data;
+   return NULL;
+}
+EXPORT_SYMBOL_GPL(kthread_nfsd_data);
+
 /**
  * kthread_probe_data - speculative version of kthread_data()
  * @task: possible kthread task in question

It feels weird to make such a special case for nfsd, but it makes this
all very easy for me; complete patch below.

--b.

commit 8b0a2e86dafa
Author: J. Bruce Fields 
Date:   Fri Jul 28 16:35:15 2017 -0400

nfsd: clients don't need to break their own delegations

We currently revoke read delegations on any write open or any operation
that modifies file data or metadata (including rename, link, and
unlink).  But if the delegation in question is the only read delegation
and is held by the client performing the operation, that's not really
necessary.

It's not always possible to prevent this in the NFSv4.0 case, because
there's not always a way to determine which client an NFSv4.0 delegation
came from.  (In theory we could try to guess this from the transport
layer, e.g., by assuming all traffic on a given TCP connection comes
from the same client.  But that's not really correct.)

In the NFSv4.1 case the session layer always tells us the client.

This patch should remove such self-conflicts in all cases where we can
reliably determine the client from the compound.

To do that we need to track "who" is performing a given (possibly
lease-breaking) file operation.  We're doing that by storing the
information in the svc_rqst and using kthread_data() to map the current
task back to a svc_rqst.

Signed-off-by: J. Bruce Fields 

diff --git a/Documentation/filesystems/locking.rst 
b/Documentation/filesystems/locking.rst
index 5057e4d9dcd1..9fdcec416614 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -425,6 +425,7 @@ prototypes::
int (*lm_grant)(struct file_lock *, struct file_lock *, int);
void (*lm_break)(struct file_lock *); /* break_lease callback */
int (*lm_change)(struct file_lock **, int);
+   bool (*lm_breaker_owns_lease)(struct file_lock *);
 
 locking rules:
 
@@ -435,6 +436,7 @@ lm_notify:  yes yes 
no
 lm_grant:  no  no  no
 lm_break:  yes no  no
 lm_change  yes no  no
+lm_breaker_owns_lease: no  no  no
 == =   =   =
 
 buffer_head
diff --git a/fs/locks.c b/fs/locks.c
index b8a31c1c4fff..a3f186846e93 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1557,6 +1557,9 @@ static bool leases_conflict(struct file_lock *lease, 
struct file_lock *breaker)
 {
bool rc;
 
+   if (lease->fl_lmops->lm_breaker_owns_lease
+   && lease->fl_lmops->lm_breaker_owns_lease(lease))
+   return false;
if ((breaker->fl_flags & FL_LAYOUT) != (lease->fl_flags & FL_LAYOUT)) {
rc = false;
goto trace;
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 0e75f7fb5fec..a6d73aa51ce4 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -2302,6 +2302,8 @@ nfsd4_proc_compound(struct svc_rqst *rqstp)
}
check_if_stalefh_allowed(args);
 
+   rqstp->rq_lease_breaker = (void **)>clp;
+
trace_nfsd_compound(rqstp, args->opcnt);
  

Re: [PATCH 0/4] allow multiple kthreadd's

2020-05-05 Thread J. Bruce Fields
On Tue, May 05, 2020 at 11:54:05AM -0400, Tejun Heo wrote:
> Hello, Bruce.
> 
> On Mon, May 04, 2020 at 10:15:14PM -0400, J. Bruce Fields wrote:
> > We're currently using it to pass the struct svc_rqst that a new nfsd
> > thread needs.  But once the new thread has gotten that, I guess it could
> > set kthread->data to some global value that it uses to say "I'm a knfsd
> > thread"?
> > 
> > I suppose that would work.
> > 
> > Though now I'm feeling greedy: it would be nice to have both some kind
> > of global flag, *and* keep kthread->data pointing to svc_rqst (as that
> > would give me a simpler and quicker way to figure out which client is
> > conflicting).  Could I take a flag bit in kthread->flags, maybe?
> 
> Hmm... that'd be solvable if kthread->data can point to a struct which does
> both things, right?

Isn't this some sort of chicken-and-egg problem?

If you don't know whether a given kthread is an nfsd thread or not, then
it's not safe to assume that kthread->data points to some nfsd-specific
structure that might tell you whether it's an nfsd thread.

> Because it doesn't have free() callback, it's a bit
> awkward but the threadfn itself can unlink and RCU-free it before returning.

It's only ever going to be referenced from the thread itself.  This is
just a way to ask "am I running as an nfsd thread?" when we're deep
inside generic filesystem code somewhere.  So I don't think there's any
complicated lifetime issues here.

--b.


Re: [PATCH 0/4] allow multiple kthreadd's

2020-05-04 Thread J. Bruce Fields
On Fri, May 01, 2020 at 02:21:54PM -0400, Tejun Heo wrote:
> Hello,
> 
> On Fri, May 01, 2020 at 10:59:24AM -0700, Linus Torvalds wrote:
> > Which kind of makes me want to point a finger at Tejun. But it's been
> > mostly PeterZ touching this file lately..
> 
> Looks fine to me too. I don't quite understand the usecase tho. It looks
> like all it's being used for is to tag some kthreads as belonging to the
> same group. Can't that be done with kthread_data()?

Yeah, so I'd forgotten about kthread->data.

We're currently using it to pass the struct svc_rqst that a new nfsd
thread needs.  But once the new thread has gotten that, I guess it could
set kthread->data to some global value that it uses to say "I'm a knfsd
thread"?

I suppose that would work.

Though now I'm feeling greedy: it would be nice to have both some kind
of global flag, *and* keep kthread->data pointing to svc_rqst (as that
would give me a simpler and quicker way to figure out which client is
conflicting).  Could I take a flag bit in kthread->flags, maybe?

--b.



Re: [PATCH 0/4] allow multiple kthreadd's

2020-05-01 Thread J. Bruce Fields
On Fri, May 01, 2020 at 07:05:46PM +, Trond Myklebust wrote:
> On Fri, 2020-05-01 at 14:49 -0400, J. Bruce Fields wrote:
> > On Fri, May 01, 2020 at 02:21:54PM -0400, Tejun Heo wrote:
> > > Hello,
> > > 
> > > On Fri, May 01, 2020 at 10:59:24AM -0700, Linus Torvalds wrote:
> > > > Which kind of makes me want to point a finger at Tejun. But it's
> > > > been
> > > > mostly PeterZ touching this file lately..
> > > 
> > > Looks fine to me too. I don't quite understand the usecase tho. It
> > > looks
> > > like all it's being used for is to tag some kthreads as belonging
> > > to the
> > > same group.
> > 
> > Pretty much.
> 
> Wen running an instance of knfsd from inside a container, you want to
> be able to have the knfsd kthreads be parented to the container init
> process so that they get killed off when the container is killed.
> 
> Right now, we can easily leak those kernel threads simply by killing
> the container.

Oh, got it.

Currently knfsd supports nfs service in containers, but it uses a single
set of threads to serve requests from any container.  It should shut the
server threads down when the last container using them goes away.

--b.



Re: [PATCH 0/4] allow multiple kthreadd's

2020-05-01 Thread J. Bruce Fields
On Fri, May 01, 2020 at 11:30:38AM -0700, Linus Torvalds wrote:
> On Fri, May 1, 2020 at 11:22 AM Tejun Heo  wrote:
> >
> > Looks fine to me too. I don't quite understand the usecase tho. It looks
> > like all it's being used for is to tag some kthreads as belonging to the
> > same group. Can't that be done with kthread_data()?
> 
> I _think_ Bruce wants the signal handling unification too, because
> nfsd wants to react to being shut down with signals.

No, maybe kthread_data() might do the job.

(I don't see how this would help with signal handling.  But, I'm kind of
ignorant of how signalling works.)

--b.



Re: [PATCH 0/4] allow multiple kthreadd's

2020-05-01 Thread J. Bruce Fields
On Fri, May 01, 2020 at 02:21:54PM -0400, Tejun Heo wrote:
> Hello,
> 
> On Fri, May 01, 2020 at 10:59:24AM -0700, Linus Torvalds wrote:
> > Which kind of makes me want to point a finger at Tejun. But it's been
> > mostly PeterZ touching this file lately..
> 
> Looks fine to me too. I don't quite understand the usecase tho. It looks
> like all it's being used for is to tag some kthreads as belonging to the
> same group.

Pretty much.

> Can't that be done with kthread_data()?

Huh, maybe so, thanks.

I need to check this from generic file locking code that could be run by
any task--but I assume there's an easy way I can check if I'm a kthread
before calling  kthread_data(current).

I do expect to expose a delegation interface for userspace servers
eventually too.  But we could do the tgid check for them and still use
kthread_data() for nfsd.  That might work.

--b.



[PATCH 3/4] kthreads: allow multiple kthreadd's

2020-05-01 Thread J. Bruce Fields
From: "J. Bruce Fields" 

Allow subsystems to run their own kthreadd's.

I'm experimenting with this to allow nfsd to put its threads into its
own thread group to make it easy for the vfs to tell when nfsd is
breaking one of its own leases.

Signed-off-by: J. Bruce Fields 
---
 include/linux/kthread.h |  20 ++-
 init/main.c |   4 +-
 kernel/kthread.c| 113 ++--
 3 files changed, 107 insertions(+), 30 deletions(-)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index 8bbcaad7ef0f..a7ffdf96a3b2 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -5,6 +5,24 @@
 #include 
 #include 
 
+struct kthread_group {
+   char *name;
+   spinlock_t create_lock;
+   struct list_head create_list;
+   struct task_struct *task;
+};
+
+extern struct kthread_group kthreadd_default;
+
+struct kthread_group *kthread_start_group(char *);
+void kthread_stop_group(struct kthread_group *);
+
+struct task_struct *kthread_group_create_on_node(struct kthread_group *,
+   int (*threadfn)(void *data),
+void *data,
+int node,
+const char namefmt[], ...);
+
 __printf(4, 5)
 struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
   void *data,
@@ -63,7 +81,7 @@ int kthread_park(struct task_struct *k);
 void kthread_unpark(struct task_struct *k);
 void kthread_parkme(void);
 
-int kthreadd(void *unused);
+int kthreadd(void *);
 extern struct task_struct *kthreadd_task;
 extern int tsk_fork_get_node(struct task_struct *tsk);
 
diff --git a/init/main.c b/init/main.c
index a48617f2e5e5..5256071b0e05 100644
--- a/init/main.c
+++ b/init/main.c
@@ -641,9 +641,9 @@ noinline void __ref rest_init(void)
rcu_read_unlock();
 
numa_default_policy();
-   pid = kernel_thread(kthreadd, NULL, CLONE_FS | CLONE_FILES);
+   pid = kernel_thread(kthreadd, _default, CLONE_FS | 
CLONE_FILES);
rcu_read_lock();
-   kthreadd_task = find_task_by_pid_ns(pid, _pid_ns);
+   kthreadd_default.task = find_task_by_pid_ns(pid, _pid_ns);
rcu_read_unlock();
 
/*
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 483bee57a9c8..5232f6f597b5 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -25,9 +25,44 @@
 #include 
 #include 
 
-static DEFINE_SPINLOCK(kthread_create_lock);
-static LIST_HEAD(kthread_create_list);
-struct task_struct *kthreadd_task;
+struct kthread_group kthreadd_default = {
+   .name = "kthreadd",
+   .create_lock = __SPIN_LOCK_UNLOCKED(kthreadd_default.create_lock),
+   .create_list = LIST_HEAD_INIT(kthreadd_default.create_list),
+};
+
+void wake_kthreadd(struct kthread_group *kg)
+{
+   wake_up_process(kg->task);
+}
+
+struct kthread_group *kthread_start_group(char *name)
+{
+   struct kthread_group *new;
+   struct task_struct *task;
+
+   new = kmalloc(sizeof(struct kthread_group), GFP_KERNEL);
+   if (!new)
+   return ERR_PTR(-ENOMEM);
+   spin_lock_init(>create_lock);
+   INIT_LIST_HEAD(>create_list);
+   new->name = name;
+   task = kthread_run(kthreadd, new, name);
+   if (IS_ERR(task)) {
+   kfree(new);
+   return ERR_CAST(task);
+   }
+   new->task = task;
+   return new;
+}
+EXPORT_SYMBOL_GPL(kthread_start_group);
+
+void kthread_stop_group(struct kthread_group *kg)
+{
+   kthread_stop(kg->task);
+   kfree(kg);
+}
+EXPORT_SYMBOL_GPL(kthread_stop_group);
 
 struct kthread_create_info
 {
@@ -301,11 +336,13 @@ static void create_kthread(struct kthread_create_info 
*create)
}
 }
 
-static __printf(4, 0)
-struct task_struct *__kthread_create_on_node(int (*threadfn)(void *data),
-   void *data, int node,
-   const char namefmt[],
-   va_list args)
+
+static __printf(5, 0)
+struct task_struct *__kthread_group_create_on_node(struct kthread_group *kg,
+   int (*threadfn)(void *data),
+   void *data, int node,
+   const char namefmt[],
+   va_list args)
 {
DECLARE_COMPLETION_ONSTACK(done);
struct task_struct *task;
@@ -319,11 +356,11 @@ struct task_struct *__kthread_create_on_node(int 
(*threadfn)(void *data),
create->node = node;
create->done = 
 
-   spin_lock(_create_lock);
-   list_add_tail(>list, _create_list);
-   spin_unlock(_create_lock);
+   spin_lock(>create_lock);
+   list_add_tail(>list, >create_list);
+   spin_unloc

[PATCH 4/4] kthreads: allow cloning threads with different flags

2020-05-01 Thread J. Bruce Fields
From: "J. Bruce Fields" 

This is so knfsd can add CLONE_THREAD.

Signed-off-by: J. Bruce Fields 
---
 include/linux/kthread.h |  3 ++-
 kernel/kthread.c| 11 +++
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index a7ffdf96a3b2..7069feb6da65 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -10,11 +10,12 @@ struct kthread_group {
spinlock_t create_lock;
struct list_head create_list;
struct task_struct *task;
+   unsigned long flags;
 };
 
 extern struct kthread_group kthreadd_default;
 
-struct kthread_group *kthread_start_group(char *);
+struct kthread_group *kthread_start_group(unsigned long, char *);
 void kthread_stop_group(struct kthread_group *);
 
 struct task_struct *kthread_group_create_on_node(struct kthread_group *,
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 5232f6f597b5..57f6687ecec7 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -29,6 +29,7 @@ struct kthread_group kthreadd_default = {
.name = "kthreadd",
.create_lock = __SPIN_LOCK_UNLOCKED(kthreadd_default.create_lock),
.create_list = LIST_HEAD_INIT(kthreadd_default.create_list),
+   .flags = CLONE_FS | CLONE_FILES | SIGCHLD,
 };
 
 void wake_kthreadd(struct kthread_group *kg)
@@ -36,7 +37,7 @@ void wake_kthreadd(struct kthread_group *kg)
wake_up_process(kg->task);
 }
 
-struct kthread_group *kthread_start_group(char *name)
+struct kthread_group *kthread_start_group(unsigned long flags, char *name)
 {
struct kthread_group *new;
struct task_struct *task;
@@ -47,6 +48,7 @@ struct kthread_group *kthread_start_group(char *name)
spin_lock_init(>create_lock);
INIT_LIST_HEAD(>create_list);
new->name = name;
+   new->flags = flags;
task = kthread_run(kthreadd, new, name);
if (IS_ERR(task)) {
kfree(new);
@@ -314,7 +316,8 @@ int tsk_fork_get_node(struct task_struct *tsk)
return NUMA_NO_NODE;
 }
 
-static void create_kthread(struct kthread_create_info *create)
+static void create_kthread(struct kthread_create_info *create,
+  unsigned long flags)
 {
int pid;
 
@@ -322,7 +325,7 @@ static void create_kthread(struct kthread_create_info 
*create)
current->pref_node_fork = create->node;
 #endif
/* We want our own signal handler (we take no signals by default). */
-   pid = kernel_thread(kthread, create, CLONE_FS | CLONE_FILES | SIGCHLD);
+   pid = kernel_thread(kthread, create, flags);
if (pid < 0) {
/* If user was SIGKILLed, I release the structure. */
struct completion *done = xchg(>done, NULL);
@@ -645,7 +648,7 @@ void kthread_do_work(struct kthread_group *kg)
list_del_init(>list);
spin_unlock(>create_lock);
 
-   create_kthread(create);
+   create_kthread(create, kg->flags);
 
spin_lock(>create_lock);
}
-- 
2.26.2



[PATCH 2/4] kthreads: Simplify tsk_fork_get_node

2020-05-01 Thread J. Bruce Fields
From: "J. Bruce Fields" 

This will also simplify a following patch that allows multiple
kthreadd's.

Signed-off-by: J. Bruce Fields 
---
 init/init_task.c | 3 +++
 kernel/fork.c| 4 
 kernel/kthread.c | 3 +--
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/init/init_task.c b/init/init_task.c
index bd403ed3e418..fdd760393760 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -154,6 +154,9 @@ struct task_struct init_task
.vtime.starttime = 0,
.vtime.state= VTIME_SYS,
 #endif
+#ifdef CONFIG_NUMA
+   .pref_node_fork = NUMA_NO_NODE,
+#endif
 #ifdef CONFIG_NUMA_BALANCING
.numa_preferred_nid = NUMA_NO_NODE,
.numa_group = NULL,
diff --git a/kernel/fork.c b/kernel/fork.c
index 8c700f881d92..fa35890534d5 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -942,6 +942,10 @@ static struct task_struct *dup_task_struct(struct 
task_struct *orig, int node)
tsk->fail_nth = 0;
 #endif
 
+#ifdef CONFIG_NUMA
+   tsk->pref_node_fork = NUMA_NO_NODE;
+#endif
+
 #ifdef CONFIG_BLK_CGROUP
tsk->throttle_queue = NULL;
tsk->use_memdelay = 0;
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 4217fded891a..483bee57a9c8 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -274,8 +274,7 @@ static int kthread(void *_create)
 int tsk_fork_get_node(struct task_struct *tsk)
 {
 #ifdef CONFIG_NUMA
-   if (tsk == kthreadd_task)
-   return tsk->pref_node_fork;
+   return tsk->pref_node_fork;
 #endif
return NUMA_NO_NODE;
 }
-- 
2.26.2



[PATCH 0/4] allow multiple kthreadd's

2020-05-01 Thread J. Bruce Fields
From: "J. Bruce Fields" 

These patches allow a caller to create its own kthreadd.

The motivation is file delegations: currently any write operation from a
client breaks all delegations, even delegations held by the same client.

To fix that, we need to know which client is performing a given
operation.

So, we let nfsd put all the nfsd threads into the same thread group (by
spawning them from its own private kthreadd), then patch the delegation
code to treat delegation breaks from the same thread group as not
conflicting, and then leave it to nfsd to sort out conflicts among its
own clients.  Those patches are in:

git://linux-nfs.org/~bfields/linux.git deleg-fix-self-conflicts

This was an idea from Trond.  Part of his motivation was that it could
work for userspace servers (like Ganesha and Samba) as well.  (We don't
currently let them request delegations, but probably will some day--it
shouldn't be difficult.)

Previously I considered instead adding a new field somewhere in the
struct task.  That might require a new system call to expose to user
space.  Or we might be able to put this in a keyring, if David Howells
thought that would work.

Before that I tried passing the identity of the breaker explicitly, but
that looks like it would require passing the new argument around to huge
swaths of the VFS.

Anyway, does this multiple kthreadd approach look reasonable?

(If so, who should handle the patches?)

--b.

J. Bruce Fields (4):
  kthreads: minor kthreadd refactoring
  kthreads: Simplify tsk_fork_get_node
  kthreads: allow multiple kthreadd's
  kthreads: allow cloning threads with different flags

 include/linux/kthread.h |  21 +-
 init/init_task.c|   3 +
 init/main.c |   4 +-
 kernel/fork.c   |   4 ++
 kernel/kthread.c| 140 +---
 5 files changed, 132 insertions(+), 40 deletions(-)

-- 
2.26.2



[PATCH 1/4] kthreads: minor kthreadd refactoring

2020-05-01 Thread J. Bruce Fields
From: "J. Bruce Fields" 

Trivial refactoring, no change in behavior.

Not really necessary, a separate function for the inner loop just seems
a little nicer to me.

Signed-off-by: J. Bruce Fields 
---
 kernel/kthread.c | 33 +++--
 1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index bfbfa481be3a..4217fded891a 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -578,6 +578,24 @@ int kthread_stop(struct task_struct *k)
 }
 EXPORT_SYMBOL(kthread_stop);
 
+void kthread_do_work(void)
+{
+   spin_lock(_create_lock);
+   while (!list_empty(_create_list)) {
+   struct kthread_create_info *create;
+
+   create = list_entry(kthread_create_list.next,
+   struct kthread_create_info, list);
+   list_del_init(>list);
+   spin_unlock(_create_lock);
+
+   create_kthread(create);
+
+   spin_lock(_create_lock);
+   }
+   spin_unlock(_create_lock);
+}
+
 int kthreadd(void *unused)
 {
struct task_struct *tsk = current;
@@ -597,20 +615,7 @@ int kthreadd(void *unused)
schedule();
__set_current_state(TASK_RUNNING);
 
-   spin_lock(_create_lock);
-   while (!list_empty(_create_list)) {
-   struct kthread_create_info *create;
-
-   create = list_entry(kthread_create_list.next,
-   struct kthread_create_info, list);
-   list_del_init(>list);
-   spin_unlock(_create_lock);
-
-   create_kthread(create);
-
-   spin_lock(_create_lock);
-   }
-   spin_unlock(_create_lock);
+   kthread_do_work();
}
 
return 0;
-- 
2.26.2



Re: [PATCH] scripts: prune-kernel : prune kernels generalized way

2019-10-19 Thread J. Bruce Fields
On Sat, Oct 19, 2019 at 06:37:22PM +0530, Bhaskar Chowdhury wrote:
> This patch will remove old kernel from the system in a selective way.

Please don't comment out code, just delete it, git's there to keep the
old code.

There's some redundant code that should be inside a loop.

A little more detail in the changelog might be useful to those of us who
are lazy about reading bash script

Looks like this just prompts for each individual delete?  Actually it
looks like it requires the user to enter the module path and kernel
version for each one which makes it not much more convenient use than a
bare "ls" and "rm".

I personally use this in unattended scripts.  I mean, I don't really
care what we do with this, as I use my own copy of the script, so
whatever's useful to more people is fine.

But if somebody does actually use it as-is, it'd be nicer to keep the
current behavior and add an option ("-i" or something) for the
interactive behavior.

--b.

> 
> Signed-off-by: Bhaskar Chowdhury 
> ---
>  scripts/prune-kernel | 86 
>  1 file changed, 72 insertions(+), 14 deletions(-)
> 
> diff --git a/scripts/prune-kernel b/scripts/prune-kernel
> index e8aa940bc0a9..9d839a4e4539 100755
> --- a/scripts/prune-kernel
> +++ b/scripts/prune-kernel
> @@ -5,17 +5,75 @@
>  # again, /boot and /lib/modules/ eventually fill up.
>  # Dumb script to purge that stuff:
> 
> -for f in "$@"
> -do
> -if rpm -qf "/lib/modules/$f" >/dev/null; then
> -echo "keeping $f (installed from rpm)"
> -elif [ $(uname -r) = "$f" ]; then
> -echo "keeping $f (running kernel) "
> -else
> -echo "removing $f"
> -rm -f "/boot/initramfs-$f.img" "/boot/System.map-$f"
> -rm -f "/boot/vmlinuz-$f"   "/boot/config-$f"
> -rm -rf "/lib/modules/$f"
> -new-kernel-pkg --remove $f
> -fi
> -done
> +#for f in "$@"
> +#do
> +#   if rpm -qf "/lib/modules/$f" >/dev/null; then
> +#echo "keeping $f (installed from rpm)"
> +#elif [ $(uname -r) = "$f" ]; then
> +#echo "keeping $f (running kernel) "
> +#else
> +#echo "removing $f"
> +#rm -f "/boot/initramfs-$f.img" "/boot/System.map-$f"
> +#rm -f "/boot/vmlinuz-$f"   "/boot/config-$f"
> +#rm -rf "/lib/modules/$f"
> +#new-kernel-pkg --remove $f
> +#   fi
> +#done
> +boot_dir=/boot
> +modules_dir=/lib/modules
> +
> +function remove_old_kernel(){
> + cd $boot_dir
> + rm -If vmlinuz-$kernel_version System.map-$kernel_version 
> config-$kernel_version
> +}
> +
> +function remove_old_modules_dir(){
> + cd $modules_dir
> + rm -rf $modules_version
> +}
> +
> +printf "\n\n Enlist the installed kernels \n\n"
> +
> +
> +find $boot_dir -name "vmlinuz-*" -type f -exec ls -1 {} \;
> +
> +printf "\n\n\n Please give the kernel version to remove: %s"
> +read kernel_version
> +
> +if [[ $kernel_version == "" ]];then
> + exit 1
> +else
> + remove_old_kernel
> +fi
> +
> +printf "\n\n Enlist the installed modules directory \n\n"
> +
> +find $modules_dir -maxdepth 0 -type d -exec ls -1 {} \;
> +
> +printf "\n\n Please give the full modules directory name to remove: %s"
> +read modules_version
> +
> +if [[ $modules_version == "" ]];then
> + printf "You have forgotten to give the modules dir to remove"
> +else
> + remove_old_modules_dir
> +fi
> +
> +printf "\n\n\n Removed kernel version:$kernel_version and associated 
> modules:$modules_version ...Done \n"
> +
> +while :
> + do
> +printf "\n\n Do you want to remove another?[YN]: %s"
> +read response
> +   if [[ $response == "Y" ]];then
> +  printf "Please give another version to remove: %s"
> +  read kernel_version
> +  remove_old_kernel
> +  printf "Please give the full modules directory name to remove: %s"
> +  read modules_version
> +  remove_old_modules_dir
> +  printf "\n\n\n Removed kernel version:$kernel_version and associated 
> modules:$modules_version ..Done \n\n"
> +  elif [[ $response == "N" ]];then
> +  exit 1
> +fi
> + done
> --
> 2.21.0


Re: SUNRPC: Checking a kmemdup() call in xdr_netobj_dup()

2019-10-14 Thread J. Bruce Fields
On Sat, Oct 12, 2019 at 08:20:04PM +0200, Markus Elfring wrote:
> I tried another script for the semantic patch language out.
> This source code analysis approach points out that the implementation
> of the function “xdr_netobj_dup” contains still an unchecked call
> of the function “kmemdup”.
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/sunrpc/xdr.h?id=1c0cc5f1ae5ee5a6913704c0d75a6e99604ee30a#n167
> https://elixir.bootlin.com/linux/v5.4-rc2/source/include/linux/sunrpc/xdr.h#L167
> 
> How do you think about to improve it?

On a quick check--I see five xdr_netobj_dup callers, and all of them
check whether dst->data is NULL.

Sounds like a false positive for your tool?

--b.


Re: [PATCH] sunrpc: fix crash when cache_head become valid before update

2019-10-11 Thread J . Bruce Fields
On Wed, Oct 09, 2019 at 09:51:23AM +1100, NeilBrown wrote:
> On Tue, Oct 08 2019,  J . Bruce Fields  wrote:
> 
> > On Tue, Oct 08, 2019 at 10:02:53AM +, Pavel Tikhomirov wrote:
> >> Add Neil to CC, sorry, had lost it somehow...
> >
> > Always happy when we can fix a bug by deleting code, and your
> > explanation makes sense to me, but I'll give Neil a chance to look it
> > over if he wants.
> 
> Yes, it makes sense to me.  But I'm not sure that is worth much.  The
> original fix got a Reviewed-by from me but was wrong.
>  Acked-by: NeilBrown 
> 
> 'Acked' is weaker than 'reviewed' - isn't it? :-)

Hah--"Self-deprecatingly-reviewed-by:"?

Anyway, applied thanks.

--b.


Re: [PATCH] sunrpc: fix crash when cache_head become valid before update

2019-10-08 Thread J . Bruce Fields
On Tue, Oct 08, 2019 at 10:02:53AM +, Pavel Tikhomirov wrote:
> Add Neil to CC, sorry, had lost it somehow...

Always happy when we can fix a bug by deleting code, and your
explanation makes sense to me, but I'll give Neil a chance to look it
over if he wants.

--b.

> 
> On 10/1/19 11:03 AM, Pavel Tikhomirov wrote:
> > I was investigating a crash in our Virtuozzo7 kernel which happened in
> > in svcauth_unix_set_client. I found out that we access m_client field
> > in ip_map structure, which was received from sunrpc_cache_lookup (we
> > have a bit older kernel, now the code is in sunrpc_cache_add_entry), and
> > these field looks uninitialized (m_client == 0x74 don't look like a
> > pointer) but in the cache_head in flags we see 0x1 which is CACHE_VALID.
> > 
> > It looks like the problem appeared from our previous fix to sunrpc (1):
> > commit 4ecd55ea0742 ("sunrpc: fix cache_head leak due to queued
> > request")
> > 
> > And we've also found a patch already fixing our patch (2):
> > commit d58431eacb22 ("sunrpc: don't mark uninitialised items as VALID.")
> > 
> > Though the crash is eliminated, I think the core of the problem is not
> > completely fixed:
> > 
> > Neil in the patch (2) makes cache_head CACHE_NEGATIVE, before
> > cache_fresh_locked which was added in (1) to fix crash. These way
> > cache_is_valid won't say the cache is valid anymore and in
> > svcauth_unix_set_client the function cache_check will return error
> > instead of 0, and we don't count entry as initialized.
> > 
> > But it looks like we need to remove cache_fresh_locked completely in
> > sunrpc_cache_lookup:
> > 
> > In (1) we've only wanted to make cache_fresh_unlocked->cache_dequeue so
> > that cache_requests with no readers also release corresponding
> > cache_head, to fix their leak.  We with Vasily were not sure if
> > cache_fresh_locked and cache_fresh_unlocked should be used in pair or
> > not, so we've guessed to use them in pair.
> > 
> > Now we see that we don't want the CACHE_VALID bit set here by
> > cache_fresh_locked, as "valid" means "initialized" and there is no
> > initialization in sunrpc_cache_add_entry. Both expiry_time and
> > last_refresh are not used in cache_fresh_unlocked code-path and also not
> > required for the initial fix.
> > 
> > So to conclude cache_fresh_locked was called by mistake, and we can just
> > safely remove it instead of crutching it with CACHE_NEGATIVE. It looks
> > ideologically better for me. Hope I don't miss something here.
> > 
> > Here is our crash backtrace:
> > [13108726.326291] BUG: unable to handle kernel NULL pointer dereference at 
> > 0074
> > [13108726.326365] IP: [] 
> > svcauth_unix_set_client+0x2ab/0x520 [sunrpc]
> > [13108726.326448] PGD 0
> > [13108726.326468] Oops: 0002 [#1] SMP
> > [13108726.326497] Modules linked in: nbd isofs xfs loop 
> > kpatch_cumulative_81_0_r1(O) xt_physdev nfnetlink_queue bluetooth rfkill 
> > ip6table_nat nf_nat_ipv6 ip_vs_wrr ip_vs_wlc ip_vs_sh nf_conntrack_netlink 
> > ip_vs_sed ip_vs_pe_sip nf_conntrack_sip ip_vs_nq ip_vs_lc ip_vs_lblcr 
> > ip_vs_lblc ip_vs_ftp ip_vs_dh nf_nat_ftp nf_conntrack_ftp iptable_raw 
> > xt_recent nf_log_ipv6 xt_hl ip6t_rt nf_log_ipv4 nf_log_common xt_LOG 
> > xt_limit xt_TCPMSS xt_tcpmss vxlan ip6_udp_tunnel udp_tunnel xt_statistic 
> > xt_NFLOG nfnetlink_log dummy xt_mark xt_REDIRECT nf_nat_redirect raw_diag 
> > udp_diag tcp_diag inet_diag netlink_diag af_packet_diag unix_diag 
> > rpcsec_gss_krb5 xt_addrtype ip6t_rpfilter ipt_REJECT nf_reject_ipv4 
> > ip6t_REJECT nf_reject_ipv6 ebtable_nat ebtable_broute nf_conntrack_ipv6 
> > nf_defrag_ipv6 ip6table_mangle ip6table_raw nfsv4
> > [13108726.327173]  dns_resolver cls_u32 binfmt_misc arptable_filter 
> > arp_tables ip6table_filter ip6_tables devlink fuse_kio_pcs ipt_MASQUERADE 
> > nf_nat_masquerade_ipv4 xt_nat iptable_nat nf_nat_ipv4 xt_comment 
> > nf_conntrack_ipv4 nf_defrag_ipv4 xt_wdog_tmo xt_multiport bonding xt_set 
> > xt_conntrack iptable_filter iptable_mangle kpatch(O) ebtable_filter 
> > ebt_among ebtables ip_set_hash_ip ip_set nfnetlink vfat fat skx_edac 
> > intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass fuse 
> > pcspkr ses enclosure joydev sg mei_me hpwdt hpilo lpc_ich mei ipmi_si 
> > shpchp ipmi_devintf ipmi_msghandler xt_ipvs acpi_power_meter ip_vs_rr nfsv3 
> > nfsd auth_rpcgss nfs_acl nfs lockd grace fscache nf_nat cls_fw sch_htb 
> > sch_cbq sch_sfq ip_vs em_u32 nf_conntrack tun br_netfilter veth overlay 
> > ip6_vzprivnet ip6_vznetstat ip_vznetstat
> > [13108726.327817]  ip_vzprivnet vziolimit vzevent vzlist vzstat vznetstat 
> > vznetdev vzmon vzdev bridge pio_kaio pio_nfs pio_direct pfmt_raw 
> > pfmt_ploop1 ploop ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif 
> > crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper scsi_transport_iscsi 
> > 8021q syscopyarea sysfillrect garp sysimgblt fb_sys_fops mrp stp ttm llc 
> > bnx2x crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel drm 
> > 

Re: Lease semantic proposal

2019-10-03 Thread J. Bruce Fields
On Wed, Oct 02, 2019 at 04:35:55PM -0400, Jeff Layton wrote:
> On Wed, 2019-10-02 at 15:27 -0400, J. Bruce Fields wrote:
> > On Wed, Oct 02, 2019 at 08:28:40AM -0400, Jeff Layton wrote:
> > > For the byte ranges, the catch there is that extending the userland
> > > interface for that later will be difficult.
> > 
> > Why would it be difficult?
> 
> Legacy userland code that wanted to use byte range enabled layouts would
> have to be rebuilt to take advantage of them. If we require a range from
> the get-go, then they will get the benefit of them once they're
> available.

I can't see writing byte-range code for a kernel that doesn't support
that yet.  How would I test it?

> > > What I'd probably suggest
> > > (and what would jive with the way pNFS works) would be to go ahead and
> > > add an offset and length to the arguments and result (maybe also
> > > whence?).
> > 
> > Why not add new commands with range arguments later if it turns out to
> > be necessary?
> 
> We could do that. It'd be a little ugly, IMO, simply because then we'd
> end up with two interfaces that do almost the exact same thing.
> 
> Should byte-range layouts at that point conflict with non-byte range
> layouts, or should they be in different "spaces" (a'la POSIX and flock
> locks)? When it's all one interface, those sorts of questions sort of
> answer themselves. When they aren't we'll have to document them clearly
> and I think the result will be more confusing for userland programmers.

I was hoping they'd be in the same space, with the old interface just
defined to deal in locks with range [0,∞).

I'm just worried about getting the interface wrong if it's specified
without being implemented.  Maybe this is straightforward enough that
there's not a risk, I don't know.

--b.



Re: Lease semantic proposal

2019-10-02 Thread J. Bruce Fields
On Wed, Oct 02, 2019 at 08:28:40AM -0400, Jeff Layton wrote:
> On Tue, 2019-10-01 at 11:17 -0700, Ira Weiny wrote:
> > On Mon, Sep 23, 2019 at 04:17:59PM -0400, Jeff Layton wrote:
> > > On Mon, 2019-09-23 at 12:08 -0700, Ira Weiny wrote:
> > > > Since the last RFC patch set[1] much of the discussion of supporting 
> > > > RDMA with
> > > > FS DAX has been around the semantics of the lease mechanism.[2]  Within 
> > > > that
> > > > thread it was suggested I try and write some documentation and/or tests 
> > > > for the
> > > > new mechanism being proposed.  I have created a foundation to test lease
> > > > functionality within xfstests.[3] This should be close to being 
> > > > accepted.
> > > > Before writing additional lease tests, or changing lots of kernel code, 
> > > > this
> > > > email presents documentation for the new proposed "layout lease" 
> > > > semantic.
> > > > 
> > > > At Linux Plumbers[4] just over a week ago, I presented the current 
> > > > state of the
> > > > patch set and the outstanding issues.  Based on the discussion there, 
> > > > well as
> > > > follow up emails, I propose the following addition to the fcntl() man 
> > > > page.
> > > > 
> > > > Thank you,
> > > > Ira
> > > > 
> > > > [1] https://lkml.org/lkml/2019/8/9/1043
> > > > [2] https://lkml.org/lkml/2019/8/9/1062
> > > > [3] https://www.spinics.net/lists/fstests/msg12620.html
> > > > [4] https://linuxplumbersconf.org/event/4/contributions/368/
> > > > 
> > > > 
> > > 
> > > Thank you so much for doing this, Ira. This allows us to debate the
> > > user-visible behavior semantics without getting bogged down in the
> > > implementation details. More comments below:
> > 
> > Thanks.  Sorry for the delay in response.  Turns out this email was in my
> > spam...  :-/  I'll need to work out why.
> > 
> > > > 
> > > > Layout Leases
> > > > -
> > > > 
> > > > Layout (F_LAYOUT) leases are special leases which can be used to 
> > > > control and/or
> > > > be informed about the manipulation of the underlying layout of a file.
> > > > 
> > > > A layout is defined as the logical file block -> physical file block 
> > > > mapping
> > > > including the file size and sharing of physical blocks among files.  
> > > > Note that
> > > > the unwritten state of a block is not considered part of file layout.
> > > > 
> > > > **Read layout lease F_RDLCK | F_LAYOUT**
> > > > 
> > > > Read layout leases can be used to be informed of layout changes by the
> > > > system or other users.  This lease is similar to the standard read 
> > > > (F_RDLCK)
> > > > lease in that any attempt to change the _layout_ of the file will be 
> > > > reported to
> > > > the process through the lease break process.  But this lease is 
> > > > different
> > > > because the file can be opened for write and data can be read and/or 
> > > > written to
> > > > the file as long as the underlying layout of the file does not change.
> > > > Therefore, the lease is not broken if the file is simply open for 
> > > > write, but
> > > > _may_ be broken if an operation such as, truncate(), fallocate() or 
> > > > write()
> > > > results in changing the underlying layout.
> > > > 
> > > > **Write layout lease (F_WRLCK | F_LAYOUT)**
> > > > 
> > > > Write Layout leases can be used to break read layout leases to indicate 
> > > > that
> > > > the process intends to change the underlying layout lease of the file.
> > > > 
> > > > A process which has taken a write layout lease has exclusive ownership 
> > > > of the
> > > > file layout and can modify that layout as long as the lease is held.
> > > > Operations which change the layout are allowed by that process.  But 
> > > > operations
> > > > from other file descriptors which attempt to change the layout will 
> > > > break the
> > > > lease through the standard lease break process.  The F_LAYOUT flag is 
> > > > used to
> > > > indicate a difference between a regular F_WRLCK and F_WRLCK with 
> > > > F_LAYOUT.  In
> > > > the F_LAYOUT case opens for write do not break the lease.  But some 
> > > > operations,
> > > > if they change the underlying layout, may.
> > > > 
> > > > The distinction between read layout leases and write layout leases is 
> > > > that
> > > > write layout leases can change the layout without breaking the lease 
> > > > within the
> > > > owning process.  This is useful to guarantee a layout prior to 
> > > > specifying the
> > > > unbreakable flag described below.
> > > > 
> > > > 
> > > 
> > > The above sounds totally reasonable. You're essentially exposing the
> > > behavior of nfsd's layout leases to userland. To be clear, will F_LAYOUT
> > > leases work the same way as "normal" leases, wrt signals and timeouts?
> > 
> > That was my intention, yes.
> >
> > > I do wonder if we're better off not trying to "or" in flags for this,
> > > and instead have a separate set of commands (maybe F_RDLAYOUT,
> > > F_WRLAYOUT, F_UNLAYOUT). Maybe I'm just bikeshedding though -- I don't
> > > feel terribly 

Re: [PATCH -next] nfsd: remove set but not used variable 'len'

2019-09-30 Thread J. Bruce Fields
Thanks, applying for 5.5.--b.

On Sat, Sep 28, 2019 at 12:21:56PM +0800, YueHaibing wrote:
> Fixes gcc '-Wunused-but-set-variable' warning:
> 
> fs/nfsd/nfs4xdr.c: In function nfsd4_encode_splice_read:
> fs/nfsd/nfs4xdr.c:3464:7: warning: variable len set but not used 
> [-Wunused-but-set-variable]
> 
> It is not used since commit 83a63072c815 ("nfsd: fix nfs read eof detection")
> 
> Reported-by: Hulk Robot 
> Signed-off-by: YueHaibing 
> ---
>  fs/nfsd/nfs4xdr.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index 533d0fc..1883370 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -3461,7 +3461,6 @@ static __be32 nfsd4_encode_splice_read(
>   struct xdr_stream *xdr = >xdr;
>   struct xdr_buf *buf = xdr->buf;
>   u32 eof;
> - long len;
>   int space_left;
>   __be32 nfserr;
>   __be32 *p = xdr->p - 2;
> @@ -3470,7 +3469,6 @@ static __be32 nfsd4_encode_splice_read(
>   if (xdr->end - xdr->p < 1)
>   return nfserr_resource;
>  
> - len = maxcount;
>   nfserr = nfsd_splice_read(read->rd_rqstp, read->rd_fhp,
> file, read->rd_offset, , );
>   read->rd_length = maxcount;
> -- 
> 2.7.4
> 


Re: [GIT PULL] nfsd changes for 5.4

2019-09-27 Thread J. Bruce Fields
On Fri, Sep 27, 2019 at 03:44:02PM -0700, Linus Torvalds wrote:
> But then the actual code is just one single small commit:
> 
> > Dave Wysochanski (1):
> >   SUNRPC: Track writers of the 'channel' file to improve 
> > cache_listeners_exist

And that's also all that was in the diffstat in my pull message, but I
somehow didn't notice.  I must be tired

> which doesn't actually match any of the things your description says
> should be there.
> 
> So I undid my pull - either the description is completely wrong, or
> you tagged the wrong commit.

Yes, the latter.  I've redone the tag; the pull request should have
been for:

  git://linux-nfs.org/~bfields/linux.git tags/nfsd-5.4


Highlights:

- add a new knfsd file cache, so that we don't have to open and
  close on each (NFSv2/v3) READ or WRITE.  This can speed up
  read and write in some cases.  It also replaces our readahead
  cache.
- Prevent silent data loss on write errors, by treating write
  errors like server reboots for the purposes of write caching,
  thus forcing clients to resend their writes.
- Tweak the code that allocates sessions to be more forgiving,
  so that NFSv4.1 mounts are less likely to hang when a server
  already has a lot of clients.
- Eliminate an arbitrary limit on NFSv4 ACL sizes; they should
  now be limited only by the backend filesystem and the
  maximum RPC size.
- Allow the server to enforce use of the correct kerberos
  credentials when a client reclaims state after a reboot.

And some miscellaneous smaller bugfixes and cleanup.


Chuck Lever (2):
  svcrdma: Remove svc_rdma_wq
  svcrdma: Use llist for managing cache of recv_ctxts

Colin Ian King (1):
  sunrpc: clean up indentation issue

Dave Wysochanski (1):
  SUNRPC: Track writers of the 'channel' file to improve 
cache_listeners_exist

J. Bruce Fields (4):
  Merge nfsd bugfixes
  nfsd: Remove unnecessary NULL checks
  Deprecate nfsd fault injection
  nfsd: eliminate an unnecessary acl size limit

Jeff Layton (12):
  sunrpc: add a new cache_detail operation for when a cache is flushed
  locks: create a new notifier chain for lease attempts
  nfsd: add a new struct file caching facility to nfsd
  nfsd: hook up nfsd_write to the new nfsd_file cache
  nfsd: hook up nfsd_read to the nfsd_file cache
  nfsd: hook nfsd_commit up to the nfsd_file cache
  nfsd: convert nfs4_file->fi_fds array to use nfsd_files
  nfsd: convert fi_deleg_file and ls_file fields to nfsd_file
  nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache
  nfsd: have nfsd_test_lock use the nfsd_file cache
  nfsd: rip out the raparms cache
  nfsd: close cached files prior to a REMOVE or RENAME that would replace 
target

NeilBrown (2):
  nfsd: handle drc over-allocation gracefully.
  nfsd: degraded slot-count more gracefully as allocation nears exhaustion.

Scott Mayhew (2):
  nfsd: add a "GetVersion" upcall for nfsdcld
  nfsd: add support for upcall version 2

Trond Myklebust (9):
  notify: export symbols for use by the knfsd file cache
  vfs: Export flush_delayed_fput for use by knfsd.
  nfsd: Fix up some unused variable warnings
  nfsd: Fix the documentation for svcxdr_tmpalloc()
  nfsd: nfsd_file cache entries should be per net namespace
  nfsd: Support the server resetting the boot verifier
  nfsd: Don't garbage collect files that might contain write errors
  nfsd: Reset the boot verifier on all write I/O errors
  nfsd: fix nfs read eof detection

YueHaibing (2):
  nfsd: remove duplicated include from filecache.c
  nfsd: Make nfsd_reset_boot_verifier_locked static

 fs/file_table.c  |   1 +
 fs/locks.c   |  62 ++
 fs/nfsd/Kconfig  |   3 +-
 fs/nfsd/Makefile |   3 +-
 fs/nfsd/acl.h|   8 -
 fs/nfsd/blocklayout.c|   3 +-
 fs/nfsd/export.c |  13 +
 fs/nfsd/filecache.c  | 934 +++
 fs/nfsd/filecache.h  |  61 ++
 fs/nfsd/netns.h  |   4 +
 fs/nfsd/nfs3proc.c   |   9 +-
 fs/nfsd/nfs3xdr.c|  13 +-
 fs/nfsd/nfs4callback.c   |  35 +-
 fs/nfsd/nfs4layouts.c|  12 +-
 fs/nfsd/nfs4proc.c   |  97 ++--
 fs/nfsd/nfs4recover.c| 388 ++---
 fs/nfsd/nfs4state.c  | 239 
 fs/nfsd/nfs4xdr.c|  56 +-
 fs/nfsd/nfsctl.c |  

[GIT PULL] nfsd changes for 5.4

2019-09-27 Thread J. Bruce Fields
Please pull nfsd changes from:

  git://linux-nfs.org/~bfields/linux.git tags/nfsd-5.4

--b.


Highlights:

- add a new knfsd file cache, so that we don't have to open and
  close on each (NFSv2/v3) READ or WRITE.  This can speed up
  read and write in some cases.  It also replaces our readahead
  cache.
- Prevent silent data loss on write errors, by treating write
  errors like server reboots for the purposes of write caching,
  thus forcing clients to resend their writes.
- Tweak the code that allocates sessions to be more forgiving,
  so that NFSv4.1 mounts are less likely to hang when a server
  already has a lot of clients.
- Eliminate an arbitrary limit on NFSv4 ACL sizes; they should
  now be limited only by the backend filesystem and the
  maximum RPC size.
- Allow the server to enforce use of the correct kerberos
  credentials when a client reclaims state after a reboot.

And some miscellaneous smaller bugfixes and cleanup.


Dave Wysochanski (1):
  SUNRPC: Track writers of the 'channel' file to improve 
cache_listeners_exist

 include/linux/sunrpc/cache.h |  6 +++---
 net/sunrpc/cache.c   | 12 
 2 files changed, 11 insertions(+), 7 deletions(-)


Re: [PATCH] sunrpc: clean up indentation issue

2019-09-25 Thread J . Bruce Fields
Applied, thanks.--b.

On Wed, Sep 25, 2019 at 02:09:30PM +0100, Colin King wrote:
> From: Colin Ian King 
> 
> There are statements that are indented incorrectly, remove the
> extraneous spacing.
> 
> Signed-off-by: Colin Ian King 
> ---
>  net/sunrpc/svc.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> index 220b79988000..d11b70552c33 100644
> --- a/net/sunrpc/svc.c
> +++ b/net/sunrpc/svc.c
> @@ -1233,8 +1233,8 @@ svc_generic_init_request(struct svc_rqst *rqstp,
>  
>   if (rqstp->rq_vers >= progp->pg_nvers )
>   goto err_bad_vers;
> -   versp = progp->pg_vers[rqstp->rq_vers];
> -   if (!versp)
> + versp = progp->pg_vers[rqstp->rq_vers];
> + if (!versp)
>   goto err_bad_vers;
>  
>   /*
> -- 
> 2.20.1


Re: [PATCH -next] nfsd: Make nfsd_reset_boot_verifier_locked static

2019-09-23 Thread J. Bruce Fields
On Mon, Sep 23, 2019 at 01:58:59PM +0800, YueHaibing wrote:
> Fix sparse warning:
> 
> fs/nfsd/nfssvc.c:364:6: warning:
>  symbol 'nfsd_reset_boot_verifier_locked' was not declared. Should it be 
> static?
> 
> Reported-by: Hulk Robot 
> Signed-off-by: YueHaibing 

OK, applied for 5.4.--b.

> ---
>  fs/nfsd/nfssvc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index 3caaf56..fdf7ed4 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -361,7 +361,7 @@ void nfsd_copy_boot_verifier(__be32 verf[2], struct 
> nfsd_net *nn)
>   done_seqretry(>boot_lock, seq);
>  }
>  
> -void nfsd_reset_boot_verifier_locked(struct nfsd_net *nn)
> +static void nfsd_reset_boot_verifier_locked(struct nfsd_net *nn)
>  {
>   ktime_get_real_ts64(>nfssvc_boot);
>  }
> -- 
> 2.7.4
> 


Re: [PATCH] overlayfs: ignore empty NFSv4 ACLs in ext4 upperdir

2019-09-18 Thread J. Bruce Fields
On Wed, Sep 18, 2019 at 11:07:31AM +0200, Miklos Szeredi wrote:
> On Fri, May 10, 2019 at 04:09:41PM -0400, J. Bruce Fields wrote:
> > On Tue, May 07, 2019 at 10:24:58AM +1000, NeilBrown wrote:
> > > Interesting perspective  though doesn't NFSv4 explicitly allow
> > > client-side ACL enforcement in the case of delegations?
> > 
> > Not really.  What you're probably thinking of is the single ACE that the
> > server can return on granting a delegation, that tells the client it can
> > skip the ACCESS check for users matching that ACE.  It's unclear how
> > useful that is.  It's currently unused by the Linux client and server.
> > 
> > > Not sure how relevant that is
> > > 
> > > It seems to me we have two options:
> > >  1/ declare the NFSv4 doesn't work as a lower layer for overlayfs and
> > > recommend people use NFSv3, or
> > >  2/ Modify overlayfs to work with NFSv4 by ignoring nfsv4 ACLs either
> > >  2a/ always - and ignore all other acls and probably all system. xattrs,
> > >  or
> > >  2b/ based on a mount option that might be
> > >   2bi/ general "noacl" or might be
> > >   2bii/ explicit "noxattr=system.nfs4acl"
> > >  
> > > I think that continuing to discuss the miniature of the options isn't
> > > going to help.  No solution is perfect - we just need to clearly
> > > document the implications of whatever we come up with.
> > > 
> > > I lean towards 2a, but I be happy with with any '2' and '1' won't kill
> > > me.
> > 
> > I guess I'd also lean towards 2a.
> > 
> > I don't think it applies to posix acls, as overlayfs is capable of
> > copying those up and evaluating them on its own.
> 
> POSIX acls are evaluated and copied up.
> 
> I guess same goes for "security.*" attributes, that are evaluated on MAC 
> checks.
> 
> I think it would be safe to ignore failure to copy up anything else.  That 
> seems
> a bit saner than just blacklisting nfs4_acl...
> 
> Something like the following untested patch.

It seems at least simple to implement and explain.

>  fs/overlayfs/copy_up.c |   16 ++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> --- a/fs/overlayfs/copy_up.c
> +++ b/fs/overlayfs/copy_up.c
> @@ -36,6 +36,13 @@ static int ovl_ccup_get(char *buf, const
>  module_param_call(check_copy_up, ovl_ccup_set, ovl_ccup_get, NULL, 0644);
>  MODULE_PARM_DESC(check_copy_up, "Obsolete; does nothing");
>  
> +static bool ovl_must_copy_xattr(const char *name)
> +{
> + return !strcmp(name, XATTR_POSIX_ACL_ACCESS) ||
> +!strcmp(name, XATTR_POSIX_ACL_DEFAULT) ||
> +!strncmp(name, XATTR_SECURITY_PREFIX, XATTR_SECURITY_PREFIX_LEN);
> +}
> +
>  int ovl_copy_xattr(struct dentry *old, struct dentry *new)
>  {
>   ssize_t list_size, size, value_size = 0;
> @@ -107,8 +114,13 @@ int ovl_copy_xattr(struct dentry *old, s
>   continue; /* Discard */
>   }
>   error = vfs_setxattr(new, name, value, size, 0);
> - if (error)
> - break;
> + if (error) {

Can we check for EOPNOTSUPP instead of any error?

Maybe we're copying up a user xattr to a filesystem that's perfectly
capable of supporting those.  And maybe there's a disk error or we run
out of disk space or something.  Then I'd rather get EIO or ENOSPC than
silently fail to copy some xattrs.

--b.

> + if (ovl_must_copy_xattr(name))
> + break;
> +
> + /* Ignore failure to copy unknown xattrs */
> + error = 0;
> + }
>   }
>   kfree(value);
>  out:


Re: Regression in 5.1.20: Reading long directory fails

2019-09-12 Thread J. Bruce Fields
On Thu, Sep 12, 2019 at 09:08:51AM -0400, Benjamin Coddington wrote:
> 
> On 12 Sep 2019, at 8:53, Trond Myklebust wrote:
> > Let's please just scrap this function and rewrite it as a generic
> > function for reading the MIC. It clearly is not a generic function for
> > reading arbitrary netobjs, and modifications like the above just make
> > the misnomer painfully obvious.
> >
> > Let's rewrite it as xdr_buf_read_mic() so that we can simplify it where
> > possible.
> 
> Ok.  I want to assume the mic will not land in the head, but I am not sure..
> Is there a scenario where the mic might land in the head, or is that bit of
> the current function left over from other uses?

Any reply that doesn't have page data?

A reply that ends up shorter than expected due to failure of an early op
in the compound?

(Unless I'm missing something.  I haven't looked at this code in a
while.  Though it was problem me that wrote it originally--apologies for
that)

--b.


Re: Regression in 5.1.20: Reading long directory fails

2019-09-06 Thread J. Bruce Fields
On Tue, Sep 03, 2019 at 08:50:39PM -0500, Jason L Tibbitts III wrote:
> I asked the XFS folks who mentioned that the issues with 64 bit inodes
> are old, constrained to larger filesystems than what I'm using, not an
> issue with nfsv4, and not present on anything but 32bit clients with old
> userspace.
> 
> In any case, I have been experimenting a bit and somehow the issue seems
> to be related to exporting with sec=krb5i:krb5p or sec=krb5i.  If I
> export with just sec=krb5p, things magically begin to work.

That's interesting!

We've occasionally had bugs that are rare corner cases in the xdr
code--e.g. if the encoded directory data hits some limit at the same
time that we reach the end of a page, and the end of the page falls at
some offset with respect to the entry we're encoding.

Something like switching between krb5i and krb5p could affect the
offsets in a way that affected the likelihood of hitting such a case.
That's one guess, anyway.

> Anyway, I hope this helps to pinpoint the problem.  I now have a really
> easy way to reproduce this without having to kick people off of the
> server, and if the successes aren't just some kind of false positives
> then I guess I also have a workaround.  I'm still at a loss as to why a
> revert of the readdir changes makes any difference at all here.

Those readdir changes were client-side, right?  Based on that I'd been
assuming a client bug, but maybe it'd be worth getting a full packet
capture of the readdir reply to make sure it's legit.  Looking at it in
wireshark should tell us quickly whether it's corrupted somehow.

--b.


[PATCH 1/9] rtl8192*: display ESSIDs using %pE

2019-09-05 Thread J. Bruce Fields
From: "J. Bruce Fields" 

Everywhere else in the kernel ESSIDs are printed using %pE, and I can't
see why there should be an exception here.

Signed-off-by: J. Bruce Fields 
---
 drivers/staging/rtl8192e/rtllib.h  | 2 +-
 drivers/staging/rtl8192u/ieee80211/ieee80211.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/rtl8192e/rtllib.h 
b/drivers/staging/rtl8192e/rtllib.h
index 2dd57e88276e..096254e422b3 100644
--- a/drivers/staging/rtl8192e/rtllib.h
+++ b/drivers/staging/rtl8192e/rtllib.h
@@ -2132,7 +2132,7 @@ static inline const char *escape_essid(const char *essid, 
u8 essid_len)
return escaped;
}
 
-   snprintf(escaped, sizeof(escaped), "%*pEn", essid_len, essid);
+   snprintf(escaped, sizeof(escaped), "%*pE", essid_len, essid);
return escaped;
 }
 
diff --git a/drivers/staging/rtl8192u/ieee80211/ieee80211.h 
b/drivers/staging/rtl8192u/ieee80211/ieee80211.h
index d36963469015..3963a08b9eb2 100644
--- a/drivers/staging/rtl8192u/ieee80211/ieee80211.h
+++ b/drivers/staging/rtl8192u/ieee80211/ieee80211.h
@@ -2426,7 +2426,7 @@ static inline const char *escape_essid(const char *essid, 
u8 essid_len)
return escaped;
}
 
-   snprintf(escaped, sizeof(escaped), "%*pEn", essid_len, essid);
+   snprintf(escaped, sizeof(escaped), "%*pE", essid_len, essid);
return escaped;
 }
 
-- 
2.21.0



[PATCH 2/9] thunderbolt: show key using %*s not %*pE

2019-09-05 Thread J. Bruce Fields
From: "J. Bruce Fields" 

%*pEp (without "h" or "o") is a no-op.  This string could contain
arbitrary (non-NULL) characters, so we do want escaping.  Use %*pE like
every other caller.

Signed-off-by: J. Bruce Fields 
---
 drivers/thunderbolt/xdomain.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/thunderbolt/xdomain.c b/drivers/thunderbolt/xdomain.c
index 5118d46702d5..4e17a7c7bf0a 100644
--- a/drivers/thunderbolt/xdomain.c
+++ b/drivers/thunderbolt/xdomain.c
@@ -636,7 +636,7 @@ static ssize_t key_show(struct device *dev, struct 
device_attribute *attr,
 * It should be null terminated but anything else is pretty much
 * allowed.
 */
-   return sprintf(buf, "%*pEp\n", (int)strlen(svc->key), svc->key);
+   return sprintf(buf, "%*pE\n", (int)strlen(svc->key), svc->key);
 }
 static DEVICE_ATTR_RO(key);
 
-- 
2.21.0



[PATCH 3/9] staging: wlan-ng: use "%*pE" for serial number

2019-09-05 Thread J. Bruce Fields
From: "J. Bruce Fields" 

Almost every user of "%*pE" in the kernel uses just bare "%*pE".  This
is the only user of "%pEhp".  I can't see why it's needed.

Signed-off-by: J. Bruce Fields 
---
 drivers/staging/wlan-ng/prism2sta.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/wlan-ng/prism2sta.c 
b/drivers/staging/wlan-ng/prism2sta.c
index fb5441399131..8f25496188aa 100644
--- a/drivers/staging/wlan-ng/prism2sta.c
+++ b/drivers/staging/wlan-ng/prism2sta.c
@@ -846,7 +846,7 @@ static int prism2sta_getcardinfo(struct wlandevice *wlandev)
result = hfa384x_drvr_getconfig(hw, HFA384x_RID_NICSERIALNUMBER,
snum, HFA384x_RID_NICSERIALNUMBER_LEN);
if (!result) {
-   netdev_info(wlandev->netdev, "Prism2 card SN: %*pEhp\n",
+   netdev_info(wlandev->netdev, "Prism2 card SN: %*pE\n",
HFA384x_RID_NICSERIALNUMBER_LEN, snum);
} else {
netdev_err(wlandev->netdev, "Failed to retrieve Prism2 Card 
SN\n");
-- 
2.21.0



[PATCH 5/9] Remove unused %*pE[achnops] formats

2019-09-05 Thread J. Bruce Fields
From: "J. Bruce Fields" 

The [achnops] are confusing, and in practice the only one anyone seems
to need is the bare %*pE.

I think some set of modifiers here might actually be useful, but the
ones we have are confusing and unused, so let's just toss these out and
then rethink what we might want to add back later.

Signed-off-by: J. Bruce Fields 
---
 Documentation/core-api/printk-formats.rst | 23 ++-
 lib/vsprintf.c| 50 ++-
 2 files changed, 7 insertions(+), 66 deletions(-)

diff --git a/Documentation/core-api/printk-formats.rst 
b/Documentation/core-api/printk-formats.rst
index c6224d039bcb..4f9d20dfb342 100644
--- a/Documentation/core-api/printk-formats.rst
+++ b/Documentation/core-api/printk-formats.rst
@@ -180,35 +180,20 @@ Raw buffer as an escaped string
 
 ::
 
-   %*pE[achnops]
+   %*pE
 
 For printing raw buffer as an escaped string. For the following buffer::
 
1b 62 20 5c 43 07 22 90 0d 5d
 
-A few examples show how the conversion would be done (excluding surrounding
+An example shows how the conversion would be done (excluding surrounding
 quotes)::
 
%*pE"\eb \C\a"\220\r]"
-   %*pEhp  "\x1bb \C\x07"\x90\x0d]"
-   %*pEa   "\e\142\040\\\103\a\042\220\r\135"
 
-The conversion rules are applied according to an optional combination
-of flags (see :c:func:`string_escape_mem` kernel documentation for the
-details):
+See :c:func:`string_escape_mem` kernel documentation for the details.
 
-   - a - ESCAPE_ANY
-   - c - ESCAPE_SPECIAL
-   - h - ESCAPE_HEX
-   - n - ESCAPE_NULL
-   - o - ESCAPE_OCTAL
-   - p - ESCAPE_NP
-   - s - ESCAPE_SPACE
-
-By default ESCAPE_ANY_NP is used.
-
-ESCAPE_ANY_NP is the sane choice for many cases, in particularly for
-printing SSIDs.
+This is used, for example, for printing SSIDs.
 
 If field width is omitted then 1 byte only will be escaped.
 
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index b0967cf17137..5522d2a052e1 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -1537,9 +1537,6 @@ static noinline_for_stack
 char *escaped_string(char *buf, char *end, u8 *addr, struct printf_spec spec,
 const char *fmt)
 {
-   bool found = true;
-   int count = 1;
-   unsigned int flags = 0;
int len;
 
if (spec.field_width == 0)
@@ -1548,38 +1545,6 @@ char *escaped_string(char *buf, char *end, u8 *addr, 
struct printf_spec spec,
if (check_pointer(, end, addr, spec))
return buf;
 
-   do {
-   switch (fmt[count++]) {
-   case 'a':
-   flags |= ESCAPE_ANY;
-   break;
-   case 'c':
-   flags |= ESCAPE_SPECIAL;
-   break;
-   case 'h':
-   flags |= ESCAPE_HEX;
-   break;
-   case 'n':
-   flags |= ESCAPE_NULL;
-   break;
-   case 'o':
-   flags |= ESCAPE_OCTAL;
-   break;
-   case 'p':
-   flags |= ESCAPE_NP;
-   break;
-   case 's':
-   flags |= ESCAPE_SPACE;
-   break;
-   default:
-   found = false;
-   break;
-   }
-   } while (found);
-
-   if (!flags)
-   flags = ESCAPE_ANY_NP;
-
len = spec.field_width < 0 ? 1 : spec.field_width;
 
/*
@@ -1587,7 +1552,8 @@ char *escaped_string(char *buf, char *end, u8 *addr, 
struct printf_spec spec,
 * the given buffer, and returns the total size of the output
 * had the buffer been big enough.
 */
-   buf += string_escape_mem(addr, len, buf, buf < end ? end - buf : 0, 
flags, NULL);
+   buf += string_escape_mem(addr, len, buf, buf < end ? end - buf : 0,
+ESCAPE_ANY_NP, NULL);
 
return buf;
 }
@@ -2038,17 +2004,7 @@ static char *kobject_string(char *buf, char *end, void 
*ptr,
  * - '[Ii][4S][hnbl]' IPv4 addresses in host, network, big or little endian 
order
  * - 'I[6S]c' for IPv6 addresses printed as specified by
  *   http://tools.ietf.org/html/rfc5952
- * - 'E[achnops]' For an escaped buffer, where rules are defined by combination
- *of the following flags (see string_escape_mem() for the
- *details):
- *  a - ESCAPE_ANY
- *  c - ESCAPE_SPECIAL
- *  h - ESCAPE_HEX
- *  n - ESCAPE_NULL
- *  o - ESCAPE_OCTAL
- *  p - ESCAPE_NP
- *  s - ESCAPE_SPACE
- *By default ESCAPE_ANY_NP is used.
+ * - 'E[achnops]' For an escaped buffer (see strin

[PATCH 7/9] Simplify string_escape_mem

2019-09-05 Thread J. Bruce Fields
From: "J. Bruce Fields" 

string_escape_mem is harder to use than necessary:

- ESCAPE_NP sounds like it means "escape nonprinting
  characters", but actually means "do not escape printing
  characters"
- the use of the "only" string to limit the list of escaped
  characters rather than supplement them is confusing and
  unehlpful.

So:

- use the "only" string to add to, rather than replace, the list
  of characters to escape
- separate flags into those that select which characters to
  escape, and those that choose the format of the escaping ("\ "
  vs "\x20" vs "\040".)  Make flags that select characters just
  select the union when they're OR'd together.

Untested.  The tests themselves, especially, I'm unsure about.

Signed-off-by: J. Bruce Fields 
---
 fs/proc/array.c|  2 +-
 fs/seq_file.c  |  2 +-
 include/linux/string_helpers.h | 14 +++
 lib/string_helpers.c   | 76 +++---
 lib/test-string_helpers.c  | 41 --
 lib/vsprintf.c |  2 +-
 net/sunrpc/cache.c |  2 +-
 7 files changed, 62 insertions(+), 77 deletions(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index 982c95d09176..bdeeb3318fa2 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -111,7 +111,7 @@ void proc_task_name(struct seq_file *m, struct task_struct 
*p, bool escape)
size = seq_get_buf(m, );
if (escape) {
ret = string_escape_str(tcomm, buf, size,
-   ESCAPE_SPECIAL, "\n\\");
+   ESCAPE_STYLE_SLASH, "\n\\");
if (ret >= size)
ret = -1;
} else {
diff --git a/fs/seq_file.c b/fs/seq_file.c
index 1600034a929b..63e5a7c4dbf7 100644
--- a/fs/seq_file.c
+++ b/fs/seq_file.c
@@ -379,7 +379,7 @@ void seq_escape(struct seq_file *m, const char *s, const 
char *esc)
size_t size = seq_get_buf(m, );
int ret;
 
-   ret = string_escape_str(s, buf, size, ESCAPE_OCTAL, esc);
+   ret = string_escape_str(s, buf, size, ESCAPE_STYLE_OCTAL, esc);
seq_commit(m, ret < size ? ret : -1);
 }
 EXPORT_SYMBOL(seq_escape);
diff --git a/include/linux/string_helpers.h b/include/linux/string_helpers.h
index 7bf0d137373d..5d350f7f6874 100644
--- a/include/linux/string_helpers.h
+++ b/include/linux/string_helpers.h
@@ -41,13 +41,13 @@ static inline int string_unescape_any_inplace(char *buf)
return string_unescape_any(buf, buf, 0);
 }
 
-#define ESCAPE_SPECIAL 0x02
-#define ESCAPE_OCTAL   0x08
-#define ESCAPE_ANY (ESCAPE_OCTAL | ESCAPE_SPECIAL)
-#define ESCAPE_NP  0x10
-#define ESCAPE_ANY_NP  (ESCAPE_ANY | ESCAPE_NP)
-#define ESCAPE_HEX 0x20
-#define ESCAPE_FLAG_MASK   0x3a
+#define ESCAPE_SPECIAL 0x01
+#define ESCAPE_NP  0x02
+#define ESCAPE_ANY_NP  (ESCAPE_SPECIAL | ESCAPE_NP)
+#define ESCAPE_STYLE_SLASH 0x20
+#define ESCAPE_STYLE_OCTAL 0x40
+#define ESCAPE_STYLE_HEX   0x80
+#define ESCAPE_FLAG_MASK   0xe3
 
 int string_escape_mem(const char *src, size_t isz, char *dst, size_t osz,
unsigned int flags, const char *only);
diff --git a/lib/string_helpers.c b/lib/string_helpers.c
index ac72159d3980..47f40406f9d4 100644
--- a/lib/string_helpers.c
+++ b/lib/string_helpers.c
@@ -400,6 +400,11 @@ static bool escape_hex(unsigned char c, char **dst, char 
*end)
return true;
 }
 
+static bool is_special(char c)
+{
+   return c == '\0' || strchr("\f\n\r\t\v\\\a\e", c);
+}
+
 /**
  * string_escape_mem - quote characters in the given memory buffer
  * @src:   source buffer (unescaped)
@@ -407,23 +412,18 @@ static bool escape_hex(unsigned char c, char **dst, char 
*end)
  * @dst:   destination buffer (escaped)
  * @osz:   destination buffer size
  * @flags: combination of the flags
- * @only:  NULL-terminated string containing characters used to limit
- * the selected escape class. If characters are included in @only
- * that would not normally be escaped by the classes selected
- * in @flags, they will be copied to @dst unescaped.
+ * @esc:   NULL-terminated string containing characters which
+ * should also be escaped.
  *
- * Description:
- * The process of escaping byte buffer includes several parts. They are applied
- * in the following sequence.
  *
- * 1. The character is matched to the printable class, if asked, and in
- *case of match it passes through to the output.
- * 2. The character is not matched to the one from @only string and thus
- *must go as-is to the output.
- * 3. The character is checked if it falls into the clas

[PATCH 6/9] Eliminate unused ESCAPE_NULL, ESCAPE_SPACE flags

2019-09-05 Thread J. Bruce Fields
From: "J. Bruce Fields" 

I can see how some finer-grained flags could be useful, but for now I'm
trying to simplify, so let's just remove these unused ones.

Note the trickiest part is actually the tests, and I still need to check
them.

Signed-off-by: J. Bruce Fields 
---
 fs/proc/array.c|  2 +-
 include/linux/string_helpers.h |  6 ++--
 lib/string_helpers.c   | 54 +++
 lib/test-string_helpers.c  | 58 --
 4 files changed, 14 insertions(+), 106 deletions(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index 46dcb6f0eccf..982c95d09176 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -111,7 +111,7 @@ void proc_task_name(struct seq_file *m, struct task_struct 
*p, bool escape)
size = seq_get_buf(m, );
if (escape) {
ret = string_escape_str(tcomm, buf, size,
-   ESCAPE_SPACE | ESCAPE_SPECIAL, "\n\\");
+   ESCAPE_SPECIAL, "\n\\");
if (ret >= size)
ret = -1;
} else {
diff --git a/include/linux/string_helpers.h b/include/linux/string_helpers.h
index 8a299a29b767..7bf0d137373d 100644
--- a/include/linux/string_helpers.h
+++ b/include/linux/string_helpers.h
@@ -41,15 +41,13 @@ static inline int string_unescape_any_inplace(char *buf)
return string_unescape_any(buf, buf, 0);
 }
 
-#define ESCAPE_SPACE   0x01
 #define ESCAPE_SPECIAL 0x02
-#define ESCAPE_NULL0x04
 #define ESCAPE_OCTAL   0x08
-#define ESCAPE_ANY \
-   (ESCAPE_SPACE | ESCAPE_OCTAL | ESCAPE_SPECIAL | ESCAPE_NULL)
+#define ESCAPE_ANY (ESCAPE_OCTAL | ESCAPE_SPECIAL)
 #define ESCAPE_NP  0x10
 #define ESCAPE_ANY_NP  (ESCAPE_ANY | ESCAPE_NP)
 #define ESCAPE_HEX 0x20
+#define ESCAPE_FLAG_MASK   0x3a
 
 int string_escape_mem(const char *src, size_t isz, char *dst, size_t osz,
unsigned int flags, const char *only);
diff --git a/lib/string_helpers.c b/lib/string_helpers.c
index 963050c0283e..ac72159d3980 100644
--- a/lib/string_helpers.c
+++ b/lib/string_helpers.c
@@ -310,7 +310,7 @@ static bool escape_passthrough(unsigned char c, char **dst, 
char *end)
return true;
 }
 
-static bool escape_space(unsigned char c, char **dst, char *end)
+static bool escape_special(unsigned char c, char **dst, char *end)
 {
char *out = *dst;
unsigned char to;
@@ -331,27 +331,6 @@ static bool escape_space(unsigned char c, char **dst, char 
*end)
case '\f':
to = 'f';
break;
-   default:
-   return false;
-   }
-
-   if (out < end)
-   *out = '\\';
-   ++out;
-   if (out < end)
-   *out = to;
-   ++out;
-
-   *dst = out;
-   return true;
-}
-
-static bool escape_special(unsigned char c, char **dst, char *end)
-{
-   char *out = *dst;
-   unsigned char to;
-
-   switch (c) {
case '\\':
to = '\\';
break;
@@ -361,6 +340,9 @@ static bool escape_special(unsigned char c, char **dst, 
char *end)
case '\e':
to = 'e';
break;
+   case '\0':
+   to = '0';
+   break;
default:
return false;
}
@@ -376,24 +358,6 @@ static bool escape_special(unsigned char c, char **dst, 
char *end)
return true;
 }
 
-static bool escape_null(unsigned char c, char **dst, char *end)
-{
-   char *out = *dst;
-
-   if (c)
-   return false;
-
-   if (out < end)
-   *out = '\\';
-   ++out;
-   if (out < end)
-   *out = '0';
-   ++out;
-
-   *dst = out;
-   return true;
-}
-
 static bool escape_octal(unsigned char c, char **dst, char *end)
 {
char *out = *dst;
@@ -465,17 +429,15 @@ static bool escape_hex(unsigned char c, char **dst, char 
*end)
  * destination buffer will not be NULL-terminated, thus caller have to append
  * it if needs.   The supported flags are::
  *
- * %ESCAPE_SPACE: (special white space, not space itself)
+ * %ESCAPE_SPECIAL:
  * '\f' - form feed
  * '\n' - new line
  * '\r' - carriage return
  * '\t' - horizontal tab
  * '\v' - vertical tab
- * %ESCAPE_SPECIAL:
  * '\\' - backslash
  * '\a' - alert (BEL)
  * '\e' - escape
- * %ESCAPE_NULL:
  * '\0' - null
  * %ESCAPE_OCTAL:
  * '\NNN' - byte with octal value NNN (3 digits)
@@ -519,15 +481,9 @@ int string_escape_mem(const char *src, size_t isz, char 
*dst, size_t osz,
(is_dict && !strchr(only, c))) {
/* do nothing */
} else {
-   

[PATCH 8/9] minor kstrdup_quotable simplification

2019-09-05 Thread J. Bruce Fields
From: "J. Bruce Fields" 

---
 lib/string_helpers.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/string_helpers.c b/lib/string_helpers.c
index 47f40406f9d4..6f553f893fda 100644
--- a/lib/string_helpers.c
+++ b/lib/string_helpers.c
@@ -518,8 +518,8 @@ char *kstrdup_quotable(const char *src, gfp_t gfp)
 {
size_t slen, dlen;
char *dst;
-   const int flags = ESCAPE_STYLE_HEX;
-   const char esc[] = "\f\n\r\t\v\a\e\\\"";
+   const int flags = ESCAPE_SPECIAL|ESCAPE_STYLE_HEX;
+   const char esc[] = "\"";
 
if (!src)
return NULL;
-- 
2.21.0



[PATCH 9/9] Remove string_escape_mem_ascii

2019-09-05 Thread J. Bruce Fields
From: "J. Bruce Fields" 

It's easier to do this in string_escape_mem now.

Might also consider non-ascii and quote-mark sprintf modifiers and then
we might make do with seq_printk.
---
 fs/seq_file.c  |  3 ++-
 include/linux/string_helpers.h |  3 +--
 lib/string_helpers.c   | 24 
 3 files changed, 7 insertions(+), 23 deletions(-)

diff --git a/fs/seq_file.c b/fs/seq_file.c
index 63e5a7c4dbf7..0e45a25523ad 100644
--- a/fs/seq_file.c
+++ b/fs/seq_file.c
@@ -390,7 +390,8 @@ void seq_escape_mem_ascii(struct seq_file *m, const char 
*src, size_t isz)
size_t size = seq_get_buf(m, );
int ret;
 
-   ret = string_escape_mem_ascii(src, isz, buf, size);
+   ret = string_escape_mem(src, isz, buf, size, ESCAPE_NP|ESCAPE_NONASCII|
+   ESCAPE_STYLE_SLASH|ESCAPE_STYLE_HEX, "\"\\");
seq_commit(m, ret < size ? ret : -1);
 }
 EXPORT_SYMBOL(seq_escape_mem_ascii);
diff --git a/include/linux/string_helpers.h b/include/linux/string_helpers.h
index 5d350f7f6874..f3388591d83f 100644
--- a/include/linux/string_helpers.h
+++ b/include/linux/string_helpers.h
@@ -43,6 +43,7 @@ static inline int string_unescape_any_inplace(char *buf)
 
 #define ESCAPE_SPECIAL 0x01
 #define ESCAPE_NP  0x02
+#define ESCAPE_NONASCII0x04
 #define ESCAPE_ANY_NP  (ESCAPE_SPECIAL | ESCAPE_NP)
 #define ESCAPE_STYLE_SLASH 0x20
 #define ESCAPE_STYLE_OCTAL 0x40
@@ -52,8 +53,6 @@ static inline int string_unescape_any_inplace(char *buf)
 int string_escape_mem(const char *src, size_t isz, char *dst, size_t osz,
unsigned int flags, const char *only);
 
-int string_escape_mem_ascii(const char *src, size_t isz, char *dst,
-   size_t osz);
 static inline int string_escape_str(const char *src, char *dst, size_t sz,
unsigned int flags, const char *only)
 {
diff --git a/lib/string_helpers.c b/lib/string_helpers.c
index 6f553f893fda..1dacf76eada0 100644
--- a/lib/string_helpers.c
+++ b/lib/string_helpers.c
@@ -439,6 +439,8 @@ static bool is_special(char c)
  * '\a' - alert (BEL)
  * '\e' - escape
  * '\0' - null
+ * %ESCAPE_NONASCII:
+ * escape characters with the high bit set
  * %ESCAPE_NP:
  * escape only non-printable characters (checked by isprint)
  * %ESCAPE_ANY_NP:
@@ -468,7 +470,8 @@ int string_escape_mem(const char *src, size_t isz, char 
*dst, size_t osz,
 
if ((is_dict && strchr(esc, c)) ||
(flags & ESCAPE_NP && !isprint(c)) ||
-   (flags & ESCAPE_SPECIAL && is_special(c))) {
+   (flags & ESCAPE_SPECIAL && is_special(c)) ||
+   (flags & ESCAPE_NONASCII && !isascii(c))) {
 
if (flags & ESCAPE_STYLE_SLASH &&
escape_special(c, , end))
@@ -491,25 +494,6 @@ int string_escape_mem(const char *src, size_t isz, char 
*dst, size_t osz,
 }
 EXPORT_SYMBOL(string_escape_mem);
 
-int string_escape_mem_ascii(const char *src, size_t isz, char *dst,
-   size_t osz)
-{
-   char *p = dst;
-   char *end = p + osz;
-
-   while (isz--) {
-   unsigned char c = *src++;
-
-   if (!isprint(c) || !isascii(c) || c == '"' || c == '\\')
-   escape_hex(c, , end);
-   else
-   escape_passthrough(c, , end);
-   }
-
-   return p - dst;
-}
-EXPORT_SYMBOL(string_escape_mem_ascii);
-
 /*
  * Return an allocated string that has been escaped of special characters
  * and double quotes, making it safe to log in quotes.
-- 
2.21.0



[PATCH 4/9] Remove unused string_escape_*_any_np

2019-09-05 Thread J. Bruce Fields
From: "J. Bruce Fields" 

These aren't called anywhere.

Signed-off-by: J. Bruce Fields 
---
 include/linux/string_helpers.h | 13 -
 1 file changed, 13 deletions(-)

diff --git a/include/linux/string_helpers.h b/include/linux/string_helpers.h
index c28955132234..8a299a29b767 100644
--- a/include/linux/string_helpers.h
+++ b/include/linux/string_helpers.h
@@ -56,25 +56,12 @@ int string_escape_mem(const char *src, size_t isz, char 
*dst, size_t osz,
 
 int string_escape_mem_ascii(const char *src, size_t isz, char *dst,
size_t osz);
-
-static inline int string_escape_mem_any_np(const char *src, size_t isz,
-   char *dst, size_t osz, const char *only)
-{
-   return string_escape_mem(src, isz, dst, osz, ESCAPE_ANY_NP, only);
-}
-
 static inline int string_escape_str(const char *src, char *dst, size_t sz,
unsigned int flags, const char *only)
 {
return string_escape_mem(src, strlen(src), dst, sz, flags, only);
 }
 
-static inline int string_escape_str_any_np(const char *src, char *dst,
-   size_t sz, const char *only)
-{
-   return string_escape_str(src, dst, sz, ESCAPE_ANY_NP, only);
-}
-
 char *kstrdup_quotable(const char *src, gfp_t gfp);
 char *kstrdup_quotable_cmdline(struct task_struct *task, gfp_t gfp);
 char *kstrdup_quotable_file(struct file *file, gfp_t gfp);
-- 
2.21.0



Re: Regression in 5.1.20: Reading long directory fails

2019-08-28 Thread J. Bruce Fields
On Wed, Aug 28, 2019 at 01:29:00PM -0500, Jason L Tibbitts III wrote:
> If I had any idea how to do that, I happily would.  I'm certainly
> willing to learn.  At least I can run strace to see where ls bombs:

Somewhat of a caveman, I might start at the code for getdents, sprinkle
printk's around until I get down to the relevant code in the NFS
layer  There's probably a better way to do it with tracing.

> getdents64(5, 0x7fc13afaf040, 262144)   = -1 EIO (Input/output error)
> 
> bcodding on IRC mentioned that is a rather large count.  Does make me
> wonder if the server is weirding out and sending the client bogus data.
> Certainly a server reboot, or maybe even just unmounting and remounting
> the filesystem or copying the data to another filesystem would tell me
> that.  In any case, as soon as I am able to mess with that server, I'll
> know more.

Might also be worth capturing the network traffic and checking whether
wireshark thinks the server replies are valid xdr.

--b.


Re: Regression in 5.1.20: Reading long directory fails

2019-08-28 Thread J. Bruce Fields
On Thu, Aug 22, 2019 at 02:39:26PM -0500, Jason L Tibbitts III wrote:
> I now have another user reporting the same failure of readdir on a long
> directory which showed up in 5.1.20 and was traced to
> 3536b79ba75ba44b9ac1a9f1634f2e833bbb735c.  I'm not sure what to do to
> get more traction besides reposting and adding some addresses to the CC
> list.  If there is any information I can provide which might help to get
> to the bottom of this, please let me know.
> 
> To recap:
> 
> 5.1.20 introduced a regression reading some large directories.  In this
> case, the directory should have 7800 files or so in it:
> 
> [root@ld00 ~]# ls -l ~dblecher|wc -l
> ls: reading directory '/home/dblecher': Input/output error
> 1844
> [root@ld00 ~]# cat /proc/version Linux version 5.1.20-300.fc30.x86_64 
> (mockbu...@bkernel04.phx2.fedoraproject.org) (gcc version 9.1.1 20190503 (Red 
> Hat 9.1.1-1) (GCC)) #1 SMP Fri Jul 26 15:03:11 UTC 2019
> 
> (The server is a Centos 7 machine running kernel 3.10.0-957.12.2.el7.x86_64.)
> 
> Building a kernel which reverts commit 
> 3536b79ba75ba44b9ac1a9f1634f2e833bbb735c:
>   Revert "NFS: readdirplus optimization by cache mechanism" (memleak)

Looks like that's db531db951f950b8 upstream.  (Do you know if it's
reproduceable upstream as well?)

> fixes the issue, but of course that revert was fixing a real issue so
> I'm not sure what to do.
> 
> I can trivially reproduce this by simply trying to list the problematic
> directories but I'm not sure how to construct such a directory; simply
> creating 1 files doesn't cause the problem for me.

Maybe it depends on having names of the right length to place some bit
of xdr on a boundary.  I wonder if it'd be possible to reproduce just by
varying the name lengths randomly till you hit it.

The fact that the problematic patch fixed a memory leak also makes me
wonder if it might have gone to far and freed something out from under
the readdir code.

> I am willing to
> test patches and can build my own kernels, and I'm happy to provide any
> debugging information you might require.  Unfortunately I don't know
> enough to dig in and figure out for myself what's going wrong.
> 
> I did file https://bugzilla.redhat.com/show_bug.cgi?id=1740954 just to
> have this in a bug tracker somewhere.  I'm happy to file one somewhere
> else if that would help.

No clever debugging ideas off the top of my head, I'm afraid.  I might
start by patching the kernel or doing some tracing to figure out exactly
where that EIO is being generated?

--b.


[GIT PULL] nfsd bugfixes for 5.3

2019-08-21 Thread J. Bruce Fields
Please pull nfsd bugfixes for 5.3 from:

  git://linux-nfs.org/~bfields/linux.git tags/nfsd-5.3-1

Fix nfsd bugs, three in the new nfsd/clients/ code, one in the reply
cache containerization.

--b.

He Zhe (1):
  nfsd4: Fix kernel crash when reading proc file reply_cache_stats

J. Bruce Fields (2):
  nfsd: use i_wrlock instead of rcu for nfsdfs i_private
  nfsd: initialize i_private before d_add

Tetsuo Handa (1):
  nfsd: fix dentry leak upon mkdir failure.

 fs/nfsd/nfscache.c |  2 +-
 fs/nfsd/nfsctl.c   | 19 +--
 2 files changed, 10 insertions(+), 11 deletions(-)


Re: kernel panic in 5.3-rc5, nfsd_reply_cache_stats_show+0x11

2019-08-21 Thread J. Bruce Fields
Probably just needs the following.

I've been slow to get some bugfixes upstream, sorry--I'll go send a pull
request now

--b.

commit 78e70e780b28
Author: He Zhe 
Date:   Tue Aug 6 17:41:04 2019 +0800

nfsd4: Fix kernel crash when reading proc file reply_cache_stats

reply_cache_stats uses wrong parameter as seq file private structure and
thus causes the following kernel crash when users read
/proc/fs/nfsd/reply_cache_stats

BUG: kernel NULL pointer dereference, address: 01f9
PGD 0 P4D 0
Oops:  [#3] SMP PTI
CPU: 6 PID: 1502 Comm: cat Tainted: G  D   5.3.0-rc3+ #1
Hardware name: Intel Corporation Broadwell Client platform/Basking Ridge, 
BIOS BDW-E2R1.86C.0118.R01.1503110618 03/11/2015
RIP: 0010:nfsd_reply_cache_stats_show+0x3b/0x2d0
Code: 41 54 49 89 f4 48 89 fe 48 c7 c7 b3 10 33 88 53 bb e8 03 00 00 e8 88 
82 d1 ff bf 58 89 41 00 e8 eb c5 85 00 48 83 eb 01 75 f0 <41> 8b 94 24 f8 01 00 
00 48 c7 c6 be 10 33 88 4c 89 ef bb e8 03 00
RSP: 0018:aa520106fe08 EFLAGS: 00010246
RAX: 00cfe1a77123 RBX:  RCX: 00291b46
RDX: 00cf RSI: 0006 RDI: 00291b28
RBP: aa520106fe20 R08: 0006 R09: 00cfe17e55dd
R10: a424e47c R11: 030b R12: 0001
R13: a424e5697000 R14: 0001 R15: a424e5697000
FS:  7f805735f580() GS:a424f8f8() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 01f9 CR3: 655ce005 CR4: 003606e0
Call Trace:
 seq_read+0x194/0x3e0
 __vfs_read+0x1b/0x40
 vfs_read+0x95/0x140
 ksys_read+0x61/0xe0
 __x64_sys_read+0x1a/0x20
 do_syscall_64+0x4d/0x120
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f805728b861
Code: fe ff ff 50 48 8d 3d 86 b4 09 00 e8 79 e0 01 00 66 0f 1f 84 00 00 00 
00 00 48 8d 05 d9 19 0d 00 8b 00 85 c0 75 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 
57 c3 66 0f 1f 44 00 00 48 83 ec 28 48 89 54
RSP: 002b:7ffea1ce3c38 EFLAGS: 0246 ORIG_RAX: 
RAX: ffda RBX: 0002 RCX: 7f805728b861
RDX: 0002 RSI: 7f8057183000 RDI: 0003
RBP: 7f8057183000 R08: 7f8057182010 R09: 
R10: 0022 R11: 0246 R12: 559a60e8ff10
R13: 0003 R14: 0002 R15: 0002
Modules linked in:
CR2: 01f9
---[ end trace 01613595153f0cba ]---
RIP: 0010:nfsd_reply_cache_stats_show+0x3b/0x2d0
Code: 41 54 49 89 f4 48 89 fe 48 c7 c7 b3 10 33 88 53 bb e8 03 00 00 e8 88 
82 d1 ff bf 58 89 41 00 e8 eb c5 85 00 48 83 eb 01 75 f0 <41> 8b 94 24 f8 01 00 
00 48 c7 c6 be 10 33 88 4c 89 ef bb e8 03 00
RSP: 0018:aa52004b3e08 EFLAGS: 00010246
RAX: 002bab45a7c6 RBX:  RCX: 00291b4c
RDX: 002b RSI: 0004 RDI: 00291b28
RBP: aa52004b3e20 R08: 0004 R09: 002bab1c8c7a
R10: a424e550 R11: 02a9 R12: 0001
R13: a424e4475000 R14: 0001 R15: a424e4475000
FS:  7f805735f580() GS:a424f8f8() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 01f9 CR3: 655ce005 CR4: 003606e0
Killed

Fixes: 3ba75830ce17 ("nfsd4: drc containerization")
Signed-off-by: He Zhe 
Signed-off-by: J. Bruce Fields 

diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
index 26ad75ae2be0..96352ab7bd81 100644
--- a/fs/nfsd/nfscache.c
+++ b/fs/nfsd/nfscache.c
@@ -571,7 +571,7 @@ nfsd_cache_append(struct svc_rqst *rqstp, struct kvec *data)
  */
 static int nfsd_reply_cache_stats_show(struct seq_file *m, void *v)
 {
-   struct nfsd_net *nn = v;
+   struct nfsd_net *nn = m->private;
 
seq_printf(m, "max entries:   %u\n", nn->max_drc_entries);
seq_printf(m, "num entries:   %u\n",



Re: [PATCH v2 1/4] fs/posix_acl: apply umask if superblock disables ACL support

2019-08-20 Thread J. Bruce Fields
What happened to these patches?  All four make sense to me, for what
it's worth; feel free to add a

Reviewed-by: J. Bruce Fields 

--b.

On Sat, Jul 13, 2019 at 06:11:57AM +0200, Max Kellermann wrote:
> The function posix_acl_create() applies the umask only if the inode
> has no ACL (= NULL) or if ACLs are not supported by the filesystem
> driver (= -EOPNOTSUPP).
> 
> However, this happens only after after the IS_POSIXACL() check
> succeeeded.  If the superblock doesn't enable ACL support, umask will
> never be applied.  A filesystem which has no ACL support will of
> course not enable SB_POSIXACL, rendering the umask-applying code path
> unreachable.
> 
> This fixes a bug which causes the umask to be ignored with O_TMPFILE
> on tmpfs:
> 
>  https://github.com/MusicPlayerDaemon/MPD/issues/558
>  https://bugs.gentoo.org/show_bug.cgi?id=686142#c3
>  https://bugzilla.kernel.org/show_bug.cgi?id=203625
> 
> Signed-off-by: Max Kellermann 
> Cc: sta...@vger.kernel.org
> ---
>  fs/posix_acl.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/posix_acl.c b/fs/posix_acl.c
> index 84ad1c90d535..4071c66f234a 100644
> --- a/fs/posix_acl.c
> +++ b/fs/posix_acl.c
> @@ -589,9 +589,14 @@ posix_acl_create(struct inode *dir, umode_t *mode,
>   *acl = NULL;
>   *default_acl = NULL;
>  
> - if (S_ISLNK(*mode) || !IS_POSIXACL(dir))
> + if (S_ISLNK(*mode))
>   return 0;
>  
> + if (!IS_POSIXACL(dir)) {
> + *mode &= ~current_umask();
> + return 0;
> + }
> +
>   p = get_acl(dir, ACL_TYPE_DEFAULT);
>   if (!p || p == ERR_PTR(-EOPNOTSUPP)) {
>   *mode &= ~current_umask();
> -- 
> 2.20.1


  1   2   3   4   5   6   7   8   9   10   >