Re: [gentoo-user] NFS kernel bug

2013-12-09 Thread Tanstaafl

On 2013-12-08 6:20 PM, Tom Wijsman tom...@gentoo.org wrote:

We can find all relevant commits IDs by searching for the commit message
in the last release of each branch; then we just enumerate all tags,
which gives us the versions where the commit is present.


that's easy for you to say... er, do...

Thanks for the comprehensive analysis/answer... :)



Re: [gentoo-user] NFS kernel bug

2013-12-08 Thread Tanstaafl

On 2013-10-26 6:19 PM, Daniel Frey djqf...@gmail.com wrote:

Just a note to other NFS server users -

There's a kernel bug that can cause unmounting an NFS share to segfault
(and not actually unmount anything.)

I had in in the kernel 3.10 version, perhaps even before that as I don't
update the kernel on my mythtv backend server that often.

It hangs the shutdown process with an oops and it will require physical
manual intervention to shut the machine down.

If you upgrade to 3.11.5 or greater the problem goes away.

I've been banging my head against the wall with this for over a week and
*finally* found a resolution after going through a lot of NFS searches
via Google.


So... is this fixed in the stable 3.10 series (ie, 3.10.7 or 3.10.17)?



Re: [gentoo-user] NFS kernel bug

2013-12-08 Thread Tom Wijsman
On Sun, 08 Dec 2013 16:13:26 -0500
Tanstaafl tansta...@libertytrek.org wrote:

 On 2013-10-26 6:19 PM, Daniel Frey djqf...@gmail.com wrote:
  Just a note to other NFS server users -
 
  There's a kernel bug that can cause unmounting an NFS share to
  segfault (and not actually unmount anything.)
 
  I had in in the kernel 3.10 version, perhaps even before that as I
  don't update the kernel on my mythtv backend server that often.
 
  It hangs the shutdown process with an oops and it will require
  physical manual intervention to shut the machine down.
 
  If you upgrade to 3.11.5 or greater the problem goes away.
 
  I've been banging my head against the wall with this for over a
  week and *finally* found a resolution after going through a lot of
  NFS searches via Google.
 
 So... is this fixed in the stable 3.10 series (ie, 3.10.7 or 3.10.17)?

TL;DR: One of both NFS fixes from 3.11.5 is fixed since v3.10.16, the
other one is fixed since v3.11.5; porting the other one back does not
appear easy, because different code is present (or an alternative fix).

== Which commits? ==

http://thread.gmane.org/gmane.linux.kernel/1577607 mentions:

1. NFSv4.1: nfs4_fl_prepare_ds - fix bugs when the connect attempt fails
2. nfsd4: fix leak of inode reference on delegation failure

== Are they present in v3.10.17? ==

For (1) we see that
http://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/log/?id=v3.10.17qt=grepq=nfs4_fl_prepare_ds
does show the commit.

For (2) we see that
http://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/log/?id=v3.10.17qt=grepq=fix%20leak%20of%20inode
does not show the commit.

So, one of both commits is fixed in 3.10.17, the other is not.

== Which versions contain these commits? ==

We can find all relevant commits IDs by searching for the commit message
in the last release of each branch; then we just enumerate all tags,
which gives us the versions where the commit is present.

Checking the upstream commits yields:

 $ git tag --contains 52b26a3e1bb3e065c32b3febdac1e1f117d88e15 # (1)
v3.12
v3.12-rc4
v3.12-rc5
v3.12-rc6
v3.12-rc7
v3.12.1
v3.12.2
v3.12.3
v3.12.4
v3.13-rc1
v3.13-rc2
v3.13-rc3
 $ git tag --contains bf7bd3e98be5c74813bee6ad496139fb0a011b3b # (2)
v3.12
v3.12-rc1
v3.12-rc2
v3.12-rc3
v3.12-rc4
v3.12-rc5
v3.12-rc6
v3.12-rc7
v3.12.1
v3.12.2
v3.12.3
v3.12.4
v3.13-rc1
v3.13-rc2
v3.13-rc3

Checking the ported back 3.11 commits yields:

 $ git tag --contains 3b12032f89e27f139828bad8120149b1584bc898 # (1)
v3.11.5
v3.11.6
v3.11.7
v3.11.8
v3.11.9
v3.11.10
 $ git tag --contains ba3460519e393d0f212403ae3535305f423d84ed # (2)
v3.11.5
v3.11.6
v3.11.7
v3.11.8
v3.11.9
v3.11.10

Checking the ported back 3.10 commit yields:

 $ git tag --contains 28f7ae257183e8064119db486190d2229caae369 # (1)
v3.10.16
v3.10.17
v3.10.18
v3.10.19
v3.10.20
v3.10.21
v3.10.22
v3.10.23

This summarizes all versions where these two commits are available.

== Only one of both is available in v3.10.17, can I apply the other? ==

It appears that (2) can't be applied to v3.10 without porting it back;
or maybe it has already applied, but in a quite different way.

== Which versions contain the bad commit(s)? Which one are affected? ==

A list of all versions that contain the bad commit of (2) are:

 $ git tag --contains 68a3396178e6688ad7367202cdf0af8ed03c8727 | tr
 '\n' ' '

v3.10 v3.10-rc1 v3.10-rc2 v3.10-rc3 v3.10-rc4 v3.10-rc5 v3.10-rc6
v3.10-rc7 v3.10.1 v3.10.10 v3.10.11 v3.10.12 v3.10.13 v3.10.14 v3.10.15
v3.10.16 v3.10.17 v3.10.18 v3.10.19 v3.10.2 v3.10.20 v3.10.21 v3.10.22
v3.10.23 v3.10.3 v3.10.4 v3.10.5 v3.10.6 v3.10.7 v3.10.8 v3.10.9 v3.11
v3.11-rc1 v3.11-rc2 v3.11-rc3 v3.11-rc4 v3.11-rc5 v3.11-rc6 v3.11-rc7
v3.11.1 v3.11.10 v3.11.2 v3.11.3 v3.11.4 v3.11.5 v3.11.6 v3.11.7
v3.11.8 v3.11.9 v3.12 v3.12-rc1 v3.12-rc2 v3.12-rc3 v3.12-rc4 v3.12-rc5
v3.12-rc6 v3.12-rc7 v3.12.1 v3.12.2 v3.12.3 v3.12.4 v3.13-rc1 v3.13-rc2
v3.13-rc3

Excluding wherever it is fixed, only v3.10-rc1 - v3.10.23 are affected.

As the first commit doesn't mention where it regressed, I cannot check
in which versions that bad commit is present for (1); though it is
definitely limited to versions lower than v3.10.16 as evidenced earlier.

-- 
With kind regards,

Tom Wijsman (TomWij)
Gentoo Developer

E-mail address  : tom...@gentoo.org
GPG Public Key  : 6D34E57D
GPG Fingerprint : C165 AF18 AB4C 400B C3D2  ABF0 95B2 1FCD 6D34 E57D


signature.asc
Description: PGP signature


[gentoo-user] NFS kernel bug

2013-10-26 Thread Daniel Frey
Just a note to other NFS server users -

There's a kernel bug that can cause unmounting an NFS share to segfault
(and not actually unmount anything.)

I had in in the kernel 3.10 version, perhaps even before that as I don't
update the kernel on my mythtv backend server that often.

It hangs the shutdown process with an oops and it will require physical
manual intervention to shut the machine down.

If you upgrade to 3.11.5 or greater the problem goes away.

I've been banging my head against the wall with this for over a week and
*finally* found a resolution after going through a lot of NFS searches
via Google.

Dan