Hi, inline
-- Matt Benjamin Red Hat, Inc. 315 West Huron Street, Suite 140A Ann Arbor, Michigan 48103 http://www.redhat.com/en/technologies/storage tel. 734-707-0660 fax. 734-769-8938 cel. 734-216-5309 ----- Original Message ----- > From: "Jeremy Bongio" <jbon...@linux.vnet.ibm.com> > To: nfs-ganesha-devel@lists.sourceforge.net > Sent: Tuesday, November 17, 2015 11:46:06 AM > Subject: [Nfs-ganesha-devel] memory leak related to same client using v3 and > v4 mounts and fixes > > There is a memory leak that is caused by V3 and V4 duplicate request > caches being shared. > > We don't keep track of whether a DRC was used for NFSv4 or NFSv3 in the > hashkey, only the address ... but each cache _does_ have a protocol > type. This type is used later to decide which request/replies should be > cached and which shouldn't. This results in large operations like READs > being cached when there is a mismatch between the request protocol-type > and the DRC protocol-type. This can quickly (in a few minutes) eat up > all memory (and trigger the OOM killer) in targeted testing. > > 1. Either we can include the DRC type when creating the hashkey for the DRC. you could include it in the sort, but not the hash key, because is pre-computed by the XDR layer (and we want to continue doing that) > > 2. Or we could stop relying on the type of the DRC and rely instead on > the type of the current request. This would involve in > nfs_dupreq_start() using get_drc_type(req); instead of drc->type. that would be fine Matt > > > What do you think is the best fix? > > Here is one quick fix I tested that worked. However, is it safe to > simply add to the hashkey? I think so, but maybe I'm not thinking of all > scenarios. > @@ -574,6 +574,12 @@ nfs_dupreq_get_drc(struct svc_req *req) > "get drc for addr: %s", str); > } > > + /* Now include the nfsv3 or nfsv4 type in hashkey. > + * Otherwise we confuse V4 and V3 caches which will > + * later mess up process for deciding if a > request is > + * is cacheable or not. */ > + drc_k.d_u.tcp.hk += dtype; > + > t = > rbtx_partition_of_scalar(&drc_st->tcp_drc_recycle_t, > drc_k.d_u.tcp.hk); > DRC_ST_LOCK(); > > > Here is a simple script I use to reproduce the defect: > #!/usr/bin/perl > > my $server_ip = "10.10.0.11"; > > while(1) { > my $output = `mount -t nfs -o vers=4 $server_ip:/ibma /mnt/cthon; > cat /mnt/cthon/a; umount /mnt/cthon; `; > my $output = `mount -t nfs -o vers=3 $server_ip:/ibm/gpfs0/a > /mnt/cthon; cat /mnt/cthon/a; umount /mnt/cthon; `; > } > > -- > Jeremy Bongio > > jbon...@us.ibm.com > IBM Linux Technology Center - Linux Filesystems Team > Linux Development Engineer > > > ------------------------------------------------------------------------------ > _______________________________________________ > Nfs-ganesha-devel mailing list > Nfs-ganesha-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel > ------------------------------------------------------------------------------ _______________________________________________ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel