Frank de Bot wrote:
> Rick Macklem wrote:
> > Frank de Bot wrote:
> >> Rick Macklem wrote:
> >>> Frank de Bot wrote:
> >>>> Hi,
> >>>>
> >>>> On a 10.1-RELEASE-p9 server I have several NFS mounts used for a
> >>>> jail.
> >>>> Because it's a server only to test, there is a low load. But the
> >>>> [nfscl]
> >>>> process is hogging a CPU after a while. This happens pretty fast,
> >>>> within
> >>>> 1 or 2 days. I'm noticing the high CPU of the process when I want to
> >>>> do
> >>>> some test after a little while (those 1 or 2 days).
> >>>>
> >>>> My jail.conf look like:
> >>>>
> >>>> exec.start = "/bin/sh /etc/rc";
> >>>> exec.stop = "/bin/sh /etc/rc.shutdown";
> >>>> exec.clean;
> >>>> mount.devfs;
> >>>> exec.consolelog = "/var/log/jail.$name.log";
> >>>> #mount.fstab = "/usr/local/etc/jail.fstab.$name";
> >>>>
> >>>> test01 {
> >>>>         host.hostname = "test01_hosting";
> >>>>         ip4.addr = somepublicaddress;
> >>>>         ip4.addr += someprivateaddress;
> >>>>
> >>>>         mount = "10.13.37.2:/tank/hostingbase      /opt/jails/test01
> >>>>    nfs     nfsv4,minorversion=1,pnfs,ro,noatime        0       0";
> >>>>         mount +=  "10.13.37.2:/tank/hosting/test
> >>>> /opt/jails/test01/opt       nfs     nfsv4,minorversion=1,pnfs,noatime
> >>>>      0       0";
> >>>>
> >>>>         path = "/opt/jails/test01";
> >>>> }
> >>>>
> >>>> Last test was with NFS 4.1, I also worked with NFS 4.(0) with the
> >>>> same
> >>>> result. In the readonly nfs share there are symbolic links point to
> >>>> the
> >>>> read-write share for logging, storing .run files, etc. When I monitor
> >>>> my
> >>>> network interface with tcpdump, there is little nfs traffic, only
> >>>> when I
> >>>> do try to access the shares there is activity.
> >>>>
> >>>> What is causing nfscl to run around in circles, hogging the CPU (it
> >>>> makes the system slow to respond too) or how can I found out what's
> >>>> the
> >>>> cause?
> >>>>
> >>> Well, the nfscl does server->client RPCs referred to as callbacks. I
> >>> have no idea what the implications of running it in a jail is, but I'd
> >>> guess that these server->client RPCs get blocked somehow, etc...
> >>> (The NFSv4.0 mechanism requires a separate IP address that the server
> >>>  can connect to on the client. For NFSv4.1, it should use the same
> >>>  TCP connection as is used for the client->server RPCs. The latter
> >>>  seems like it should work, but there is probably some glitch.)
> >>>
> >>> ** Just run without the nfscl daemon (it is only needed for delegations
> >>> or
> >>> pNFS).
> >>
> >> How can I disable the nfscl daemon?
> >>
> > Well, the daemon for the callbacks is called nfscbd.
> > You should check via "ps ax", to see if you have it running.
> > (For NFSv4.0 you probably don't want it running, but for NFSv4.1 you
> >  do need it. pNFS won't work at all without it, but unless you have a
> >  server that supports pNFS, it won't work anyhow. Unless your server is
> >  a clustered Netapp Filer, you should probably not have the "pnfs" option.)
> > 
> > To run the "nfscbd" daemon you can set:
> > nfscbd_enable="TRUE"
> > in your /etc/rc.conf will start it on boot.
> > Alternately, just type "nfscbd" as root.
> > 
> > The "nfscl" thread is always started when an NFSv4 mount is done. It does
> > an assortment of housekeeping things, including a Renew op to make sure the
> > lease doesn't expire. If for some reason the jail blocks these Renew RPCs,
> > it will try to do them over and over and ... because having the lease
> > expire is bad news for NFSv4. How could you tell?
> > Well, capturing packets between the client and server, then looking at them
> > in wireshark is probably the only way. (Or maybe a large count for Renew
> > in the output from "nfsstat -e".)
> > 
> > "nfscbd" is optional for NFSv4.0. Without it, you simply don't do
> > callbacks/delegations.
> > For NFSv4.1 it is pretty much required, but doesn't need a separate
> > server->client TCP
> > connection.
> > --> I'd enable it for NFSv4.1, but disable it for NFSv4.0 at least as a
> > starting point.
> > 
> > And as I said before, none of this is tested within jails, so I have no
> > idea
> > what effect the jails have. Someone who understands jails might have some
> > insight
> > w.r.t. this?
> > 
> > rick
> > 
> 
> Since last time I haven't tried to use pnfs and just sticked with
> nfsv4.0. nfscbd is not running. The server is now running 10.2. The
> number of renews is not very high (56k, getattr is for example 283M)
> View with wireshark, renew calls look good ,the nfs status is ok.
> 
> Is there a way to know what [nfscl] is active with?
> 
> I do understand nfs + jails could have issues, but I like to understand
> them.
> 
It is conceivable that this high load is caused by the problem identified in
PR#205193, where jails can't talk to the nfsuserd because 127.0.0.1 gets
translated to another ip address for the machine.

The attached patches are the same ones as in the PR, which change the nfsuserd
to use an AF_LOCAL socket instead.

If it's convenient, it would be nice to try these patches (kernel + nfsuserd).

rick
ps: They are against head, so I'm not sure how easily they will apply to 
FreeBSD10.

> 
> Frank
> 
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
> 
--- fs/nfs/nfs_commonkrpc.c.sav	2015-12-10 20:01:35.488317000 -0500
+++ fs/nfs/nfs_commonkrpc.c	2015-12-10 20:04:40.903854000 -0500
@@ -198,6 +198,8 @@ newnfs_connect(struct nfsmount *nmp, str
 			nconf = getnetconfigent("udp");
 		else
 			nconf = getnetconfigent("tcp");
+	else if (saddr->sa_family == AF_LOCAL)
+		nconf = getnetconfigent("local");
 	else
 		if (nrp->nr_sotype == SOCK_DGRAM)
 			nconf = getnetconfigent("udp6");
--- fs/nfs/nfs_commonsubs.c.sav	2015-12-10 20:09:25.218282000 -0500
+++ fs/nfs/nfs_commonsubs.c	2015-12-11 17:16:10.912717000 -0500
@@ -3048,7 +3048,7 @@ nfsrv_cmpmixedcase(u_char *cp, u_char *c
  * Set the port for the nfsuserd.
  */
 APPLESTATIC int
-nfsrv_nfsuserdport(u_short port, NFSPROC_T *p)
+nfsrv_nfsuserdport(struct sockaddr *sad, u_short port, NFSPROC_T *p)
 {
 	struct nfssockreq *rp;
 	struct sockaddr_in *ad;
@@ -3067,16 +3067,24 @@ nfsrv_nfsuserdport(u_short port, NFSPROC
 	 */
 	rp = &nfsrv_nfsuserdsock;
 	rp->nr_client = NULL;
-	rp->nr_sotype = SOCK_DGRAM;
-	rp->nr_soproto = IPPROTO_UDP;
-	rp->nr_lock = (NFSR_RESERVEDPORT | NFSR_LOCALHOST);
 	rp->nr_cred = NULL;
-	NFSSOCKADDRALLOC(rp->nr_nam);
-	NFSSOCKADDRSIZE(rp->nr_nam, sizeof (struct sockaddr_in));
-	ad = NFSSOCKADDR(rp->nr_nam, struct sockaddr_in *);
-	ad->sin_family = AF_INET;
-	ad->sin_addr.s_addr = htonl((u_int32_t)0x7f000001);	/* 127.0.0.1 */
-	ad->sin_port = port;
+	rp->nr_lock = (NFSR_RESERVEDPORT | NFSR_LOCALHOST);
+	if (sad != NULL) {
+		/* Use the AF_LOCAL socket address passed in. */
+		rp->nr_sotype = SOCK_STREAM;
+		rp->nr_soproto = 0;
+		rp->nr_nam = sad;
+	} else {
+		/* Use the port# for a UDP socket (old nfsuserd). */
+		rp->nr_sotype = SOCK_DGRAM;
+		rp->nr_soproto = IPPROTO_UDP;
+		NFSSOCKADDRALLOC(rp->nr_nam);
+		NFSSOCKADDRSIZE(rp->nr_nam, sizeof (struct sockaddr_in));
+		ad = NFSSOCKADDR(rp->nr_nam, struct sockaddr_in *);
+		ad->sin_family = AF_INET;
+		ad->sin_addr.s_addr = htonl((u_int32_t)0x7f000001);
+		ad->sin_port = port;
+	}
 	rp->nr_prog = RPCPROG_NFSUSERD;
 	rp->nr_vers = RPCNFSUSERD_VERS;
 	error = newnfs_connect(NULL, rp, NFSPROCCRED(p), p, 0);
--- fs/nfs/nfs_var.h.sav	2015-12-10 20:36:01.369536000 -0500
+++ fs/nfs/nfs_var.h	2015-12-10 20:36:22.446237000 -0500
@@ -128,7 +128,7 @@ int nfsrv_checksetattr(vnode_t, struct n
     NFSPROC_T *);
 int nfsrv_checkgetattr(struct nfsrv_descript *, vnode_t,
     struct nfsvattr *, nfsattrbit_t *, struct ucred *, NFSPROC_T *);
-int nfsrv_nfsuserdport(u_short, NFSPROC_T *);
+int nfsrv_nfsuserdport(struct sockaddr *, u_short, NFSPROC_T *);
 void nfsrv_nfsuserddelport(void);
 void nfsrv_throwawayallstate(NFSPROC_T *);
 int nfsrv_checksequence(struct nfsrv_descript *, uint32_t, uint32_t *,
--- fs/nfs/nfs_commonport.c.sav	2015-12-10 20:36:56.465656000 -0500
+++ fs/nfs/nfs_commonport.c	2015-12-11 17:15:30.025307000 -0500
@@ -41,6 +41,7 @@ __FBSDID("$FreeBSD: head/sys/fs/nfs/nfs_
  */
 #include <fs/nfs/nfsport.h>
 #include <sys/sysctl.h>
+#include <rpc/rpc_com.h>
 #include <vm/vm.h>
 #include <vm/vm_object.h>
 #include <vm/vm_page.h>
@@ -534,11 +535,30 @@ nfssvc_call(struct thread *p, struct nfs
 		goto out;
 	} else if (uap->flag & NFSSVC_NFSUSERDPORT) {
 		u_short sockport;
+		struct sockaddr *sad;
+		struct sockaddr_un *sun;
 
-		error = copyin(uap->argp, (caddr_t)&sockport,
-		    sizeof (u_short));
-		if (!error)
-			error = nfsrv_nfsuserdport(sockport, p);
+		if ((uap->flag & NFSSVC_NEWSTRUCT) != 0) {
+			/* New nfsuserd using an AF_LOCAL socket. */
+			sun = malloc(sizeof(struct sockaddr_un), M_SONAME,
+			    M_WAITOK | M_ZERO);
+			error = copyinstr(uap->argp, sun->sun_path,
+			    sizeof(sun->sun_path), NULL);
+			if (error != 0) {
+				free(sun, M_SONAME);
+				return (error);
+			}
+		        sun->sun_family = AF_LOCAL;
+		        sun->sun_len = SUN_LEN(sun);
+			sockport = 0;
+			sad = (struct sockaddr *)sun;
+		} else {
+			error = copyin(uap->argp, (caddr_t)&sockport,
+			    sizeof (u_short));
+			sad = NULL;
+		}
+		if (error == 0)
+			error = nfsrv_nfsuserdport(sad, sockport, p);
 	} else if (uap->flag & NFSSVC_NFSUSERDDELPORT) {
 		nfsrv_nfsuserddelport();
 		error = 0;
--- usr.sbin/nfsuserd/nfsuserd.c.sav	2015-12-09 18:46:29.284972000 -0500
+++ usr.sbin/nfsuserd/nfsuserd.c	2015-12-10 21:35:17.505343000 -0500
@@ -35,6 +35,7 @@ __FBSDID("$FreeBSD: head/usr.sbin/nfsuse
 #include <sys/mount.h>
 #include <sys/socket.h>
 #include <sys/socketvar.h>
+#include <sys/stat.h>
 #include <sys/time.h>
 #include <sys/ucred.h>
 #include <sys/vnode.h>
@@ -43,6 +44,7 @@ __FBSDID("$FreeBSD: head/usr.sbin/nfsuse
 #include <nfs/nfssvc.h>
 
 #include <rpc/rpc.h>
+#include <rpc/rpc_com.h>
 
 #include <fs/nfs/rpcv2.h>
 #include <fs/nfs/nfsproto.h>
@@ -73,6 +75,9 @@ static bool_t	xdr_getid(XDR *, caddr_t);
 static bool_t	xdr_getname(XDR *, caddr_t);
 static bool_t	xdr_retval(XDR *, caddr_t);
 
+#ifndef _PATH_NFSUSERDSOCK
+#define _PATH_NFSUSERDSOCK	"/var/run/nfsuserd.sock"
+#endif
 #define	MAXNAME		1024
 #define	MAXNFSUSERD	20
 #define	DEFNFSUSERD	4
@@ -103,15 +108,15 @@ main(int argc, char *argv[])
 	struct nfsd_idargs nid;
 	struct passwd *pwd;
 	struct group *grp;
-	int sock, one = 1;
-	SVCXPRT *udptransp;
-	u_short portnum;
+	int oldmask, sock;
+	SVCXPRT *xprt;
 	sigset_t signew;
 	char hostname[MAXHOSTNAMELEN + 1], *cp;
 	struct addrinfo *aip, hints;
 	static uid_t check_dups[MAXUSERMAX];
 	gid_t grps[NGROUPS];
 	int ngroup;
+	struct sockaddr_un sun;
 
 	if (modfind("nfscommon") < 0) {
 		/* Not present in kernel, try loading it */
@@ -245,46 +250,42 @@ main(int argc, char *argv[])
 	for (i = 0; i < nfsuserdcnt; i++)
 		slaves[i] = (pid_t)-1;
 
-	/*
-	 * Set up the service port to accept requests via UDP from
-	 * localhost (127.0.0.1).
-	 */
-	if ((sock = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP)) < 0)
-		err(1, "cannot create udp socket");
-
-	/*
-	 * Not sure what this does, so I'll leave it here for now.
-	 */
-	setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, &one, sizeof(one));
-	
-	if ((udptransp = svcudp_create(sock)) == NULL)
-		err(1, "Can't set up socket");
-
-	/*
-	 * By not specifying a protocol, it is linked into the
-	 * dispatch queue, but not registered with portmapper,
-	 * which is just what I want.
-	 */
-	if (!svc_register(udptransp, RPCPROG_NFSUSERD, RPCNFSUSERD_VERS,
-	    nfsuserdsrv, 0))
-		err(1, "Can't register nfsuserd");
+	memset(&sun, 0, sizeof sun);
+	sun.sun_family = AF_LOCAL;
+	unlink(_PATH_NFSUSERDSOCK);
+	strcpy(sun.sun_path, _PATH_NFSUSERDSOCK);
+	sun.sun_len = SUN_LEN(&sun);
+	sock = socket(AF_LOCAL, SOCK_STREAM, 0);
+	if (sock < 0)
+		err(1, "Can't create local nfsuserd socket");
+	oldmask = umask(S_IXUSR | S_IRWXG | S_IRWXO);
+	if (bind(sock, (struct sockaddr *)&sun, sun.sun_len) < 0)
+		err(1, "Can't bind local nfsuserd socket");
+	umask(oldmask);
+	if (listen(sock, SOMAXCONN) < 0)
+		err(1, "Can't listen on local nfsuserd socket");
+	xprt = svc_vc_create(sock, RPC_MAXDATASIZE, RPC_MAXDATASIZE);
+	if (xprt == NULL)
+		err(1, "Can't create transport for local nfsuserd socket");
+	if (!svc_reg(xprt, RPCPROG_NFSUSERD, RPCNFSUSERD_VERS, nfsuserdsrv,
+	    NULL))
+		err(1, "Can't register service for local nfsuserd socket");
 
 	/*
-	 * Tell the kernel what my port# is.
+	 * Tell the kernel what the socket's path is.
 	 */
-	portnum = htons(udptransp->xp_port);
 #ifdef DEBUG
-	printf("portnum=0x%x\n", portnum);
+	printf("sockpath=%s\n", _PATH_NFSUSERDSOCK);
 #else
-	if (nfssvc(NFSSVC_NFSUSERDPORT, (caddr_t)&portnum) < 0) {
+	if (nfssvc(NFSSVC_NFSUSERDPORT | NFSSVC_NEWSTRUCT, _PATH_NFSUSERDSOCK)
+	    < 0) {
 		if (errno == EPERM) {
 			fprintf(stderr,
 			    "Can't start nfsuserd when already running");
 			fprintf(stderr,
 			    " If not running, use the -force option.\n");
-		} else {
-			fprintf(stderr, "Can't do nfssvc() to add port\n");
-		}
+		} else
+			fprintf(stderr, "Can't do nfssvc() to add socket\n");
 		exit(1);
 	}
 #endif
@@ -455,28 +456,11 @@ nfsuserdsrv(struct svc_req *rqstp, SVCXP
 	struct passwd *pwd;
 	struct group *grp;
 	int error;
-	u_short sport;
 	struct info info;
 	struct nfsd_idargs nid;
-	u_int32_t saddr;
 	gid_t grps[NGROUPS];
 	int ngroup;
 
-	/*
-	 * Only handle requests from 127.0.0.1 on a reserved port number.
-	 * (Since a reserved port # at localhost implies a client with
-	 *  local root, there won't be a security breach. This is about
-	 *  the only case I can think of where a reserved port # means
-	 *  something.)
-	 */
-	sport = ntohs(transp->xp_raddr.sin_port);
-	saddr = ntohl(transp->xp_raddr.sin_addr.s_addr);
-	if ((rqstp->rq_proc != NULLPROC && sport >= IPPORT_RESERVED) ||
-	    saddr != 0x7f000001) {
-		syslog(LOG_ERR, "req from ip=0x%x port=%d\n", saddr, sport);
-		svcerr_weakauth(transp);
-		return;
-	}
 	switch (rqstp->rq_proc) {
 	case NULLPROC:
 		if (!svc_sendreply(transp, (xdrproc_t)xdr_void, NULL))
_______________________________________________
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Reply via email to