Bug#695492: CIFS mount fails if I ctrl-c a long-running find process (Linux mounting Windows share)
On Sun, Jan 13, 2013 at 05:46:35AM -0500, Jeff Layton wrote: On Sat, 12 Jan 2013 10:28:01 -0800 John Darrah xyl...@gmail.com wrote: Is there a command or kernel magic the can force a dump to see where the contention is that is causing the hang? Also, I just tried starting the VM and mounting the CIFS drives and then just letting it sit there without running anything to touch the drives they still hang. So this means the CTRL-C thing has nothing to do with it. Ok, so it sounds like the original bug is now fixed with the patch I proposed. This other thing sounds like it warrants a new bug. When you say it hangs, does the whole box hang or is it just processes that touch the cifs mount? If you know the pid of the hung process, you can look at /proc/pid/stack to see what it's doing. There are also things like sysrq-t. You can also set up kdump and force a crash on a machine to get a coredump, and then try to analyze it to figure out why it's hung. I've have looked at this several times, but all I can come up with is the contents of /proc/pid/stack. The is an 'ls' command that is waiting for something. I can see some CIFS stuff but I have no idea what i'm looking at. This was taken after about 30 minutes in the hung state. [c11fee0d] kernel_setsockopt+0x34/0x46 [f86925c7] smb_send_rqst+0x107/0x170 [cifs] [c1035b66] prepare_to_wait+0x12/0x37 [f86922d8] wait_for_response.isra.8+0x6d/0xc2 [cifs] [c1035af9] autoremove_wake_function+0x0/0x29 [f8692c7a] SendReceive+0x141/0x1f1 [cifs] [f867b793] CIFSSMBNegotiate+0x17c/0x6bf [cifs] [f8697fc3] cifs_negotiate+0xb/0x31 [cifs] [f8686557] cifs_negotiate_protocol+0x3b/0x62 [cifs] [f867b471] cifs_reconnect_tcon+0x16f/0x235 [cifs] [c10870ff] prep_new_page+0xac/0xe0 [f867b550] smb_init+0x19/0x58 [cifs] [f867f815] CIFSSMBQPathInfo+0x4c/0x1e2 [cifs] [f8697eb4] cifs_query_path_info+0x26/0x5a [cifs] [f868e327] cifs_get_inode_info+0x10d/0x4a1 [cifs] [c10a9e6a] __kmalloc+0x8d/0x99 [f86875e9] build_path_from_dentry+0xab/0x182 [cifs] [f868761b] build_path_from_dentry+0xdd/0x182 [cifs] [f868f8e2] cifs_revalidate_dentry_attr+0xd7/0x131 [cifs] [f868f965] cifs_revalidate_dentry+0x9/0x1d [cifs] [f8687497] cifs_d_revalidate+0x13/0x6e [cifs] [c10b5c84] d_revalidate+0x5/0x6 [c10b6922] lookup_fast+0x169/0x1ed [c10b6c13] walk_component+0x2e/0x144 [c10b7288] link_path_walk+0x32c/0x3ca [c10b764f] path_lookupat+0x4d/0x251 [c10b7872] filename_lookup+0x1f/0x6c [c10b93bf] user_path_at_empty+0x59/0x81 [c10b6201] vfs_readlink+0x2d/0x3c [c10b6256] generic_readlink+0x46/0x6a [c10b93f2] user_path_at+0xb/0xe [c10b2d24] vfs_fstatat+0x33/0x61 [c10b2d77] vfs_stat+0x10/0x12 [c10b31e5] sys_stat64+0xe/0x21 [c10bd157] dput+0x16/0x96 [c10b31af] sys_readlinkat+0x82/0x93 [c10b31d3] sys_readlink+0x13/0x17 [c12a2a7f] syscall_call+0x7/0xb [] 0x Sorry I can't be more help. -- john -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#695492: CIFS mount fails if I ctrl-c a long-running find process (Linux mounting Windows share)
On Sat, 12 Jan 2013 10:28:01 -0800 John Darrah xyl...@gmail.com wrote: On Fri, Jan 11, 2013 at 08:27:16AM -0500, Jeff Layton wrote: On Thu, 10 Jan 2013 20:29:43 -0800 John Darrah xyl...@gmail.com wrote: On Fri, Jan 04, 2013 at 07:09:33AM -0500, Jeff Layton wrote: On Thu, 3 Jan 2013 21:29:22 -0800 John Darrah xyl...@gmail.com wrote: On Sat, Dec 29, 2012 at 12:26:07PM +0100, Ben Hutchings wrote: On Fri, 2012-12-28 at 22:01 -0500, Jeff Layton wrote: On Sat, 29 Dec 2012 01:24:36 +0100 Ben Hutchings b...@decadent.org.uk wrote: On Mon, 2012-12-24 at 09:14 -0500, Jeff Layton wrote: On Sun, 23 Dec 2012 09:10:34 -0500 Jeff Layton jlay...@redhat.com wrote: [...] I had a look at the code today and suspect that I know what the problem is. When the kernel goes to send a request, it first signs it and then bumps the sequence numbers that it tracks. If the request doesn't actually make it out onto the wire, like when the task catches a signal, those sequence numbers remain high even though the request didn't go out. Here's an untested patch that might help tell whether this is the case. You may want to try it and see if it does. Note that this fix is a bit of a kludge and is not suitable for merging! A better fix would involve changing when the sequence number gets bumped in the first place. If this patch seems to help things, then I'll look at coding up that up. [...] I was able to reproduce this, and I don't think the above patch will fix it (at least not completely). The problem seems to be that the NT cancel command is screwing up the sequence numbers. We'll have to do some research to figure out why that's occurring. Jeff, we got a bug report in Debian which seems to be the same problem: http://bugs.debian.org/695492. Please cc John Darrah and the bug address as above. Ben. You may want to try this patch. It seems to fix the problem for me, but I think there is probably some more work to do in this area. http://www.spinics.net/lists/linux-cifs/msg07576.html John, you can test this patch by following instructions at http://kernel-handbook.alioth.debian.org/ch-common-tasks.html#s-common-official. Please reply-to-all to Jeff's message when you have a result. Ben. OK... I built a 3.2.35 kernel with the patch to transport.c and also a 3.7.1 with the patch to smb1ops.c and loaded them into my wheezy VM. I tested both by starting commands to frob the CIFS mounts and then typing a CTRL-C to kill the command, and they were stable (at least 50 attempts using each kernel with the CTRL-C fired at random times into the running command). But... now another issue affects both kernels. It seems that after 10 to 15 minutes of non use, the mount hangs and the command accessing the mount can only be killed with a SIGKILL... but only sometimes. Sometimes only a reboot would unwedge things. It seems when the mount would hang, I would get the: CIFS VFS: Server amifile01 has not responded in 300 seconds. Reconnecting... error except the 3.7 kernel reported 120 seconds instead of the 300 seconds noted above. Interesting, I haven't noticed that issue, but I'll try to reproduce it when I get a chance. Is there a command or kernel magic the can force a dump to see where the contention is that is causing the hang? Also, I just tried starting the VM and mounting the CIFS drives and then just letting it sit there without running anything to touch the drives they still hang. So this means the CTRL-C thing has nothing to do with it. Ok, so it sounds like the original bug is now fixed with the patch I proposed. This other thing sounds like it warrants a new bug. When you say it hangs, does the whole box hang or is it just processes that touch the cifs mount? If you know the pid of the hung process, you can look at /proc/pid/stack to see what it's doing. There are also things like sysrq-t. You can also set up kdump and force a crash on a machine to get a coredump, and then try to analyze it to figure out why it's hung. -- Jeff Layton jlay...@redhat.com -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#695492: CIFS mount fails if I ctrl-c a long-running find process (Linux mounting Windows share)
On 1/13/2013 2:46 AM, Jeff Layton wrote: On Sat, 12 Jan 2013 10:28:01 -0800 John Darrah xyl...@gmail.com wrote: On Fri, Jan 11, 2013 at 08:27:16AM -0500, Jeff Layton wrote: On Thu, 10 Jan 2013 20:29:43 -0800 John Darrah xyl...@gmail.com wrote: On Fri, Jan 04, 2013 at 07:09:33AM -0500, Jeff Layton wrote: On Thu, 3 Jan 2013 21:29:22 -0800 John Darrah xyl...@gmail.com wrote: On Sat, Dec 29, 2012 at 12:26:07PM +0100, Ben Hutchings wrote: On Fri, 2012-12-28 at 22:01 -0500, Jeff Layton wrote: On Sat, 29 Dec 2012 01:24:36 +0100 Ben Hutchings b...@decadent.org.uk wrote: On Mon, 2012-12-24 at 09:14 -0500, Jeff Layton wrote: On Sun, 23 Dec 2012 09:10:34 -0500 Jeff Layton jlay...@redhat.com wrote: [...] I had a look at the code today and suspect that I know what the problem is. When the kernel goes to send a request, it first signs it and then bumps the sequence numbers that it tracks. If the request doesn't actually make it out onto the wire, like when the task catches a signal, those sequence numbers remain high even though the request didn't go out. Here's an untested patch that might help tell whether this is the case. You may want to try it and see if it does. Note that this fix is a bit of a kludge and is not suitable for merging! A better fix would involve changing when the sequence number gets bumped in the first place. If this patch seems to help things, then I'll look at coding up that up. [...] I was able to reproduce this, and I don't think the above patch will fix it (at least not completely). The problem seems to be that the NT cancel command is screwing up the sequence numbers. We'll have to do some research to figure out why that's occurring. Jeff, we got a bug report in Debian which seems to be the same problem: http://bugs.debian.org/695492. Please cc John Darrah and the bug address as above. Ben. You may want to try this patch. It seems to fix the problem for me, but I think there is probably some more work to do in this area. http://www.spinics.net/lists/linux-cifs/msg07576.html John, you can test this patch by following instructions at http://kernel-handbook.alioth.debian.org/ch-common-tasks.html#s-common-official. Please reply-to-all to Jeff's message when you have a result. Ben. OK... I built a 3.2.35 kernel with the patch to transport.c and also a 3.7.1 with the patch to smb1ops.c and loaded them into my wheezy VM. I tested both by starting commands to frob the CIFS mounts and then typing a CTRL-C to kill the command, and they were stable (at least 50 attempts using each kernel with the CTRL-C fired at random times into the running command). But... now another issue affects both kernels. It seems that after 10 to 15 minutes of non use, the mount hangs and the command accessing the mount can only be killed with a SIGKILL... but only sometimes. Sometimes only a reboot would unwedge things. It seems when the mount would hang, I would get the: CIFS VFS: Server amifile01 has not responded in 300 seconds. Reconnecting... error except the 3.7 kernel reported 120 seconds instead of the 300 seconds noted above. Interesting, I haven't noticed that issue, but I'll try to reproduce it when I get a chance. Is there a command or kernel magic the can force a dump to see where the contention is that is causing the hang? Also, I just tried starting the VM and mounting the CIFS drives and then just letting it sit there without running anything to touch the drives they still hang. So this means the CTRL-C thing has nothing to do with it. Ok, so it sounds like the original bug is now fixed with the patch I proposed. This other thing sounds like it warrants a new bug. When you say it hangs, does the whole box hang or is it just processes that touch the cifs mount? Yes, only the processes that touch the mount hang. I if make several attempts at using SIGKILL, I can sometimes make the hung processes die. Then I can unmount and remount the drives and they seem OK until they hang again. If you know the pid of the hung process, you can look at /proc/pid/stack to see what it's doing. There are also things like sysrq-t. You can also set up kdump and force a crash on a machine to get a coredump, and then try to analyze it to figure out why it's hung. I will attempt to get some useful info from one of the above suggestions. -- john -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#695492: CIFS mount fails if I ctrl-c a long-running find process (Linux mounting Windows share)
On Fri, Jan 11, 2013 at 08:27:16AM -0500, Jeff Layton wrote: On Thu, 10 Jan 2013 20:29:43 -0800 John Darrah xyl...@gmail.com wrote: On Fri, Jan 04, 2013 at 07:09:33AM -0500, Jeff Layton wrote: On Thu, 3 Jan 2013 21:29:22 -0800 John Darrah xyl...@gmail.com wrote: On Sat, Dec 29, 2012 at 12:26:07PM +0100, Ben Hutchings wrote: On Fri, 2012-12-28 at 22:01 -0500, Jeff Layton wrote: On Sat, 29 Dec 2012 01:24:36 +0100 Ben Hutchings b...@decadent.org.uk wrote: On Mon, 2012-12-24 at 09:14 -0500, Jeff Layton wrote: On Sun, 23 Dec 2012 09:10:34 -0500 Jeff Layton jlay...@redhat.com wrote: [...] I had a look at the code today and suspect that I know what the problem is. When the kernel goes to send a request, it first signs it and then bumps the sequence numbers that it tracks. If the request doesn't actually make it out onto the wire, like when the task catches a signal, those sequence numbers remain high even though the request didn't go out. Here's an untested patch that might help tell whether this is the case. You may want to try it and see if it does. Note that this fix is a bit of a kludge and is not suitable for merging! A better fix would involve changing when the sequence number gets bumped in the first place. If this patch seems to help things, then I'll look at coding up that up. [...] I was able to reproduce this, and I don't think the above patch will fix it (at least not completely). The problem seems to be that the NT cancel command is screwing up the sequence numbers. We'll have to do some research to figure out why that's occurring. Jeff, we got a bug report in Debian which seems to be the same problem: http://bugs.debian.org/695492. Please cc John Darrah and the bug address as above. Ben. You may want to try this patch. It seems to fix the problem for me, but I think there is probably some more work to do in this area. http://www.spinics.net/lists/linux-cifs/msg07576.html John, you can test this patch by following instructions at http://kernel-handbook.alioth.debian.org/ch-common-tasks.html#s-common-official. Please reply-to-all to Jeff's message when you have a result. Ben. OK... I built a 3.2.35 kernel with the patch to transport.c and also a 3.7.1 with the patch to smb1ops.c and loaded them into my wheezy VM. I tested both by starting commands to frob the CIFS mounts and then typing a CTRL-C to kill the command, and they were stable (at least 50 attempts using each kernel with the CTRL-C fired at random times into the running command). But... now another issue affects both kernels. It seems that after 10 to 15 minutes of non use, the mount hangs and the command accessing the mount can only be killed with a SIGKILL... but only sometimes. Sometimes only a reboot would unwedge things. It seems when the mount would hang, I would get the: CIFS VFS: Server amifile01 has not responded in 300 seconds. Reconnecting... error except the 3.7 kernel reported 120 seconds instead of the 300 seconds noted above. Interesting, I haven't noticed that issue, but I'll try to reproduce it when I get a chance. Is there a command or kernel magic the can force a dump to see where the contention is that is causing the hang? Also, I just tried starting the VM and mounting the CIFS drives and then just letting it sit there without running anything to touch the drives they still hang. So this means the CTRL-C thing has nothing to do with it. -- john -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#695492: CIFS mount fails if I ctrl-c a long-running find process (Linux mounting Windows share)
On Thu, 10 Jan 2013 20:29:43 -0800 John Darrah xyl...@gmail.com wrote: On Fri, Jan 04, 2013 at 07:09:33AM -0500, Jeff Layton wrote: On Thu, 3 Jan 2013 21:29:22 -0800 John Darrah xyl...@gmail.com wrote: On Sat, Dec 29, 2012 at 12:26:07PM +0100, Ben Hutchings wrote: On Fri, 2012-12-28 at 22:01 -0500, Jeff Layton wrote: On Sat, 29 Dec 2012 01:24:36 +0100 Ben Hutchings b...@decadent.org.uk wrote: On Mon, 2012-12-24 at 09:14 -0500, Jeff Layton wrote: On Sun, 23 Dec 2012 09:10:34 -0500 Jeff Layton jlay...@redhat.com wrote: [...] I had a look at the code today and suspect that I know what the problem is. When the kernel goes to send a request, it first signs it and then bumps the sequence numbers that it tracks. If the request doesn't actually make it out onto the wire, like when the task catches a signal, those sequence numbers remain high even though the request didn't go out. Here's an untested patch that might help tell whether this is the case. You may want to try it and see if it does. Note that this fix is a bit of a kludge and is not suitable for merging! A better fix would involve changing when the sequence number gets bumped in the first place. If this patch seems to help things, then I'll look at coding up that up. [...] I was able to reproduce this, and I don't think the above patch will fix it (at least not completely). The problem seems to be that the NT cancel command is screwing up the sequence numbers. We'll have to do some research to figure out why that's occurring. Jeff, we got a bug report in Debian which seems to be the same problem: http://bugs.debian.org/695492. Please cc John Darrah and the bug address as above. Ben. You may want to try this patch. It seems to fix the problem for me, but I think there is probably some more work to do in this area. http://www.spinics.net/lists/linux-cifs/msg07576.html John, you can test this patch by following instructions at http://kernel-handbook.alioth.debian.org/ch-common-tasks.html#s-common-official. Please reply-to-all to Jeff's message when you have a result. Ben. OK... I built a 3.2.35 kernel with the patch to transport.c and also a 3.7.1 with the patch to smb1ops.c and loaded them into my wheezy VM. I tested both by starting commands to frob the CIFS mounts and then typing a CTRL-C to kill the command, and they were stable (at least 50 attempts using each kernel with the CTRL-C fired at random times into the running command). But... now another issue affects both kernels. It seems that after 10 to 15 minutes of non use, the mount hangs and the command accessing the mount can only be killed with a SIGKILL... but only sometimes. Sometimes only a reboot would unwedge things. It seems when the mount would hang, I would get the: CIFS VFS: Server amifile01 has not responded in 300 seconds. Reconnecting... error except the 3.7 kernel reported 120 seconds instead of the 300 seconds noted above. Interesting, I haven't noticed that issue, but I'll try to reproduce it when I get a chance. Below is one of the kernel logs after I SIGKILL'd things... it looks like I trigered a fault of some kind. Maybe it has some meaning (this log only happened once). Hmmm... Looks like a problem in the virtualbox code. Certainly doesn't appear to be cifs-related. It seems like we saw something similar when all of the lockless dcache stuff went upstream, so it may be that the vbox stuff needs to be forward-ported to handle that correctly. -- john Jan 7 07:06:34 jax kernel: imklog 5.8.11, log source = /proc/kmsg started. Jan 7 07:06:34 jax kernel: [0.00] Initializing cgroup subsys cpuset Jan 7 07:06:34 jax kernel: [0.00] Initializing cgroup subsys cpu Jan 7 07:06:34 jax kernel: [0.00] Linux version 3.2.0-4-486 (debian-ker...@lists.debian.org) (gcc version 4.6.3 (Debian 4.6.3-14) ) #1 Debian 3.2.35-2 -a bunch removed- Jan 7 08:30:31 jax kernel: [ 17.072068] eth0: no IPv6 routers present Jan 7 08:31:17 jax kernel: [ 63.273900] FS-Cache: Netfs 'cifs' registered for caching Jan 7 08:31:17 jax kernel: [ 63.304164] CIFS VFS: default security mechanism requested. The default security mechanism will be upgraded from ntlm to ntlmv2 in kernel release 3.3 Jan 7 08:51:20 jax kernel: [ 1266.602096] CIFS VFS: Server amifile01 has not responded in 300 seconds. Reconnecting... Jan 7 08:51:20 jax kernel: [ 1266.602347] CIFS VFS: Server amifile02 has not responded in 300 seconds. Reconnecting... Jan 7 09:06:57 jax kernel: [ 2203.298637]
Bug#695492: CIFS mount fails if I ctrl-c a long-running find process (Linux mounting Windows share)
On Fri, Jan 04, 2013 at 07:09:33AM -0500, Jeff Layton wrote: On Thu, 3 Jan 2013 21:29:22 -0800 John Darrah xyl...@gmail.com wrote: On Sat, Dec 29, 2012 at 12:26:07PM +0100, Ben Hutchings wrote: On Fri, 2012-12-28 at 22:01 -0500, Jeff Layton wrote: On Sat, 29 Dec 2012 01:24:36 +0100 Ben Hutchings b...@decadent.org.uk wrote: On Mon, 2012-12-24 at 09:14 -0500, Jeff Layton wrote: On Sun, 23 Dec 2012 09:10:34 -0500 Jeff Layton jlay...@redhat.com wrote: [...] I had a look at the code today and suspect that I know what the problem is. When the kernel goes to send a request, it first signs it and then bumps the sequence numbers that it tracks. If the request doesn't actually make it out onto the wire, like when the task catches a signal, those sequence numbers remain high even though the request didn't go out. Here's an untested patch that might help tell whether this is the case. You may want to try it and see if it does. Note that this fix is a bit of a kludge and is not suitable for merging! A better fix would involve changing when the sequence number gets bumped in the first place. If this patch seems to help things, then I'll look at coding up that up. [...] I was able to reproduce this, and I don't think the above patch will fix it (at least not completely). The problem seems to be that the NT cancel command is screwing up the sequence numbers. We'll have to do some research to figure out why that's occurring. Jeff, we got a bug report in Debian which seems to be the same problem: http://bugs.debian.org/695492. Please cc John Darrah and the bug address as above. Ben. You may want to try this patch. It seems to fix the problem for me, but I think there is probably some more work to do in this area. http://www.spinics.net/lists/linux-cifs/msg07576.html John, you can test this patch by following instructions at http://kernel-handbook.alioth.debian.org/ch-common-tasks.html#s-common-official. Please reply-to-all to Jeff's message when you have a result. Ben. OK... I built a 3.2.35 kernel with the patch to transport.c and also a 3.7.1 with the patch to smb1ops.c and loaded them into my wheezy VM. I tested both by starting commands to frob the CIFS mounts and then typing a CTRL-C to kill the command, and they were stable (at least 50 attempts using each kernel with the CTRL-C fired at random times into the running command). But... now another issue affects both kernels. It seems that after 10 to 15 minutes of non use, the mount hangs and the command accessing the mount can only be killed with a SIGKILL... but only sometimes. Sometimes only a reboot would unwedge things. It seems when the mount would hang, I would get the: CIFS VFS: Server amifile01 has not responded in 300 seconds. Reconnecting... error except the 3.7 kernel reported 120 seconds instead of the 300 seconds noted above. Below is one of the kernel logs after I SIGKILL'd things... it looks like I trigered a fault of some kind. Maybe it has some meaning (this log only happened once). -- john Jan 7 07:06:34 jax kernel: imklog 5.8.11, log source = /proc/kmsg started. Jan 7 07:06:34 jax kernel: [0.00] Initializing cgroup subsys cpuset Jan 7 07:06:34 jax kernel: [0.00] Initializing cgroup subsys cpu Jan 7 07:06:34 jax kernel: [0.00] Linux version 3.2.0-4-486 (debian-ker...@lists.debian.org) (gcc version 4.6.3 (Debian 4.6.3-14) ) #1 Debian 3.2.35-2 -a bunch removed- Jan 7 08:30:31 jax kernel: [ 17.072068] eth0: no IPv6 routers present Jan 7 08:31:17 jax kernel: [ 63.273900] FS-Cache: Netfs 'cifs' registered for caching Jan 7 08:31:17 jax kernel: [ 63.304164] CIFS VFS: default security mechanism requested. The default security mechanism will be upgraded from ntlm to ntlmv2 in kernel release 3.3 Jan 7 08:51:20 jax kernel: [ 1266.602096] CIFS VFS: Server amifile01 has not responded in 300 seconds. Reconnecting... Jan 7 08:51:20 jax kernel: [ 1266.602347] CIFS VFS: Server amifile02 has not responded in 300 seconds. Reconnecting... Jan 7 09:06:57 jax kernel: [ 2203.298637] [ cut here ] Jan 7 09:06:57 jax kernel: [ 2203.298645] WARNING: at /root/linux-3.2.35/fs/dcache.c:1291 d_set_d_op+0x24/0x85() Jan 7 09:06:57 jax kernel: [ 2203.298648] Hardware name: VirtualBox Jan 7 09:06:57 jax kernel: [ 2203.298651] Modules linked in: des_generic ecb md4 hmac nls_utf8 cifs vboxsf(O) nfsd nfs nfs_acl auth_rpcgss fscache lockd sunrpc loop snd_intel8x0 snd_ac97_codec snd_pcsp snd_pcm snd_page_alloc snd_timer psmouse joydev parport_pc parport usbhid snd hid vboxguest(O) evdev serio_raw battery ac ac97_bus soundcore button ext4 crc16 jbd2 mbcache sg sr_mod sd_mod
Bug#695492: CIFS mount fails if I ctrl-c a long-running find process (Linux mounting Windows share)
On Mon, 2012-12-24 at 09:14 -0500, Jeff Layton wrote: On Sun, 23 Dec 2012 09:10:34 -0500 Jeff Layton jlay...@redhat.com wrote: [...] I had a look at the code today and suspect that I know what the problem is. When the kernel goes to send a request, it first signs it and then bumps the sequence numbers that it tracks. If the request doesn't actually make it out onto the wire, like when the task catches a signal, those sequence numbers remain high even though the request didn't go out. Here's an untested patch that might help tell whether this is the case. You may want to try it and see if it does. Note that this fix is a bit of a kludge and is not suitable for merging! A better fix would involve changing when the sequence number gets bumped in the first place. If this patch seems to help things, then I'll look at coding up that up. [...] I was able to reproduce this, and I don't think the above patch will fix it (at least not completely). The problem seems to be that the NT cancel command is screwing up the sequence numbers. We'll have to do some research to figure out why that's occurring. Jeff, we got a bug report in Debian which seems to be the same problem: http://bugs.debian.org/695492. Please cc John Darrah and the bug address as above. Ben. -- Ben Hutchings It is easier to change the specification to fit the program than vice versa. signature.asc Description: This is a digitally signed message part
Bug#695492: CIFS mount fails if I ctrl-c a long-running find process (Linux mounting Windows share)
On Sat, 29 Dec 2012 01:24:36 +0100 Ben Hutchings b...@decadent.org.uk wrote: On Mon, 2012-12-24 at 09:14 -0500, Jeff Layton wrote: On Sun, 23 Dec 2012 09:10:34 -0500 Jeff Layton jlay...@redhat.com wrote: [...] I had a look at the code today and suspect that I know what the problem is. When the kernel goes to send a request, it first signs it and then bumps the sequence numbers that it tracks. If the request doesn't actually make it out onto the wire, like when the task catches a signal, those sequence numbers remain high even though the request didn't go out. Here's an untested patch that might help tell whether this is the case. You may want to try it and see if it does. Note that this fix is a bit of a kludge and is not suitable for merging! A better fix would involve changing when the sequence number gets bumped in the first place. If this patch seems to help things, then I'll look at coding up that up. [...] I was able to reproduce this, and I don't think the above patch will fix it (at least not completely). The problem seems to be that the NT cancel command is screwing up the sequence numbers. We'll have to do some research to figure out why that's occurring. Jeff, we got a bug report in Debian which seems to be the same problem: http://bugs.debian.org/695492. Please cc John Darrah and the bug address as above. Ben. You may want to try this patch. It seems to fix the problem for me, but I think there is probably some more work to do in this area. http://www.spinics.net/lists/linux-cifs/msg07576.html -- Jeff Layton jlay...@redhat.com -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org