Re: Strange system hangs
On Sat, 29 Sep 2007, Nick Piggin wrote: On Friday 28 September 2007 18:42, Krzysztof Oledzki wrote: Hello, I am experiencing weird system hangs. Once about 2-5 weeks system freezes and stops accepting remote connections, so it is no longer possible to connect to most important services: smtp (postfix), www (squid) or even ssh. Such connection is accepted but then it hangs. What is strange, that previously established ssh session is usable. It is possible to work on such system until you do something stupid like "less /var/log/all.log". Using strace I found that process blocks on: Is this a regression? If so, what's the most recent kernel that didn't show the problem? The symptoms could be consistent with some place doing a balance_dirty_pages while holding a lock that is required for IO, but I can't see a smoking gun (you've got contention on i_mutex, but that should be OK). Can you see if there is any memory under writeback that isn't being completed (sysrq+M), also a list the locks held after the hang might be helpful (compile in lockdep and sysrq+D) Is anything currently running? (sysrq+P and even a full sysrq+T task list could be useful). Are any IO errors occurring at all? It seems that 2.6.23.x still fails but somehow different. I updated my bugreport at: http://bugzilla.kernel.org/show_bug.cgi?id=9182. There are new attachments with traces and an oops that happened while I was taking the debugging data. Thank you. Best regards, Krzysztof Olędzki
Re: Strange system hangs
On Sat, 29 Sep 2007, Nick Piggin wrote: On Friday 28 September 2007 18:42, Krzysztof Oledzki wrote: Hello, I am experiencing weird system hangs. Once about 2-5 weeks system freezes and stops accepting remote connections, so it is no longer possible to connect to most important services: smtp (postfix), www (squid) or even ssh. Such connection is accepted but then it hangs. What is strange, that previously established ssh session is usable. It is possible to work on such system until you do something stupid like less /var/log/all.log. Using strace I found that process blocks on: Is this a regression? If so, what's the most recent kernel that didn't show the problem? The symptoms could be consistent with some place doing a balance_dirty_pages while holding a lock that is required for IO, but I can't see a smoking gun (you've got contention on i_mutex, but that should be OK). Can you see if there is any memory under writeback that isn't being completed (sysrq+M), also a list the locks held after the hang might be helpful (compile in lockdep and sysrq+D) Is anything currently running? (sysrq+P and even a full sysrq+T task list could be useful). Are any IO errors occurring at all? It seems that 2.6.23.x still fails but somehow different. I updated my bugreport at: http://bugzilla.kernel.org/show_bug.cgi?id=9182. There are new attachments with traces and an oops that happened while I was taking the debugging data. Thank you. Best regards, Krzysztof Olędzki
Re: Strange system hangs
Hello, This report tends to become a novel. In short, the most important facts: - after some days uptime, suddenly a process like rsync is in a write congestion; other processes follow. - balance_dirty_pages_ratelimited_nr Problem - great amount of dirty pages - processes do not terminate and cause a heavy load - process accounting, even though not enabled? We have serious problems on some servers running kernel 2.6.19 and up. The thread http://marc.info/?l=linux-kernel=119252148829463=2 matches exactly our problem. In particular, the balance_dirty_pages_ratelimited_nr problem in Krzysztof Oledzki's trace. It seems to be the same problem like in http://marc.info/?l=linux-kernel=119125485615927=2 which may be fixed by this patch for 2.6.23-git http://marc.info/?l=git-commits-head=119263941428270=2 but may differ, because - we do not have any nfs, loop or fuse mounts - it's regarded as speed issue which resovles in seconds but not hours We have observed that the problem occurs within about 10 days, (but one kernel version showed it within 24h on the same machine). For some machines it takes one or two months for the problem to come up. But we also have machines (comparable setup, same linux installation (debian sarge), same kernel config, completely different hardware) running kernel 2.6.20 where it does not emerge. We also have machines running kernel <= 2.6.18 which never showed the problem. Our kernel config always derived from previous kernels. Our kernel is vanilla and comes from ftp.kernel.org. But for a test, we also tried kernel 2.6.20-16 from ubuntu, which also showed the problem. Of course, we have tried the kernel without commercial modules (not tainted). Unfortunately we could not force the error to happen (we have to wait). And very interestingly, we completely exchanged all hardware components (taken from a machine where the bug did not happen) and the unstable server still left unstable. We observe that the dirty pages count increases (/prov/vmstat) and /proc/meminfo shows an amount of 400 MB (!) when the problem appears. It's mostly during the backup process (rsync). But we also had the failure when backup was turned off; we just had to wait longer for it to happen; rsync seems to be a catalysator for the problem. When the error occurs, then processes do not terminate: they try to exit, but still remain in the process list. The machine is powerful, and thus even if the load is above 500, the program itself is fast and responsive, and after the last expectet lines of a program (i.e. "uptime") it does not terminate and remains in 'D' state. When killing the "rsync" process (or / and others), the machine may recover. Or if we have time to wait (we usually have not), the lock resolves after several hours. In a test we "waited" 14 hours. The number of dirty pages decreases from 400 MB to <100 MB. We observed, that after the machine has run into trouble (i.e. after 10 days troubleless uptime), it then always shows this error on the daily rsync backup. Diagnostics: with "echo t > /proc/sysrq-trigger" we see, that many processes hang in a mutex_lock after ext3_file_write(). Some of these are in congestion_wait() after balance_dirty_pages_ratelimited_nr() after ext3_file_write(). We could not enforce this long time deadlock by hand. But it's obvoiously the same (due to the call trace) because we can trigger a short-time with multible concurrent "dd if=/dev/zero of=foo bs=4000k" processes. I could only speculate if the non-terminating processes cause or tighten the problem, or if they're just the cause of process accounting (see below) which is also in wait state for writing the data to a file. Nevertheless the sysrq-trigger method allowed us to see what causes terminating processes to wait in their exit()-call: do_group_exit() calls do_exit() which calls acct_process(). acct_process() does a do_sync_write() which hangs in a mutex_lock. If we boot a machine, then enable process accounting (acct(2)) and then do the file-I/O tests mentioned above, we have the same effect of non-terminating processes, and the sysrq-trigger result corresponds. They terminate after some outstanding blocks from "dd" are written. If process accounting is off, the kernel does not call acct_process() (tested), which is expected. Ok, this explains the many non-terminating processes and the load. But it raises another question. We do not have and do not need process accounting and we do not even have installed the accton tools. Thus, why does the buggy machine calls acct_process() during the exit of processes? Unfortunately, the kernel has no fence (/sys would be nice) for looking if the process accounting is really on, and if, to which file it actually writes. For the next error, which we tensly await to happen, we are prepared to: - force process-accounting off with call acct(0) and examine the output of sysrq-trigger - install a patched kernel which gives us the opportunity to
Re: Strange system hangs
Hello, This report tends to become a novel. In short, the most important facts: - after some days uptime, suddenly a process like rsync is in a write congestion; other processes follow. - balance_dirty_pages_ratelimited_nr Problem - great amount of dirty pages - processes do not terminate and cause a heavy load - process accounting, even though not enabled? We have serious problems on some servers running kernel 2.6.19 and up. The thread http://marc.info/?l=linux-kernelm=119252148829463w=2 matches exactly our problem. In particular, the balance_dirty_pages_ratelimited_nr problem in Krzysztof Oledzki's trace. It seems to be the same problem like in http://marc.info/?l=linux-kernelm=119125485615927w=2 which may be fixed by this patch for 2.6.23-git http://marc.info/?l=git-commits-headm=119263941428270w=2 but may differ, because - we do not have any nfs, loop or fuse mounts - it's regarded as speed issue which resovles in seconds but not hours We have observed that the problem occurs within about 10 days, (but one kernel version showed it within 24h on the same machine). For some machines it takes one or two months for the problem to come up. But we also have machines (comparable setup, same linux installation (debian sarge), same kernel config, completely different hardware) running kernel 2.6.20 where it does not emerge. We also have machines running kernel = 2.6.18 which never showed the problem. Our kernel config always derived from previous kernels. Our kernel is vanilla and comes from ftp.kernel.org. But for a test, we also tried kernel 2.6.20-16 from ubuntu, which also showed the problem. Of course, we have tried the kernel without commercial modules (not tainted). Unfortunately we could not force the error to happen (we have to wait). And very interestingly, we completely exchanged all hardware components (taken from a machine where the bug did not happen) and the unstable server still left unstable. We observe that the dirty pages count increases (/prov/vmstat) and /proc/meminfo shows an amount of 400 MB (!) when the problem appears. It's mostly during the backup process (rsync). But we also had the failure when backup was turned off; we just had to wait longer for it to happen; rsync seems to be a catalysator for the problem. When the error occurs, then processes do not terminate: they try to exit, but still remain in the process list. The machine is powerful, and thus even if the load is above 500, the program itself is fast and responsive, and after the last expectet lines of a program (i.e. uptime) it does not terminate and remains in 'D' state. When killing the rsync process (or / and others), the machine may recover. Or if we have time to wait (we usually have not), the lock resolves after several hours. In a test we waited 14 hours. The number of dirty pages decreases from 400 MB to 100 MB. We observed, that after the machine has run into trouble (i.e. after 10 days troubleless uptime), it then always shows this error on the daily rsync backup. Diagnostics: with echo t /proc/sysrq-trigger we see, that many processes hang in a mutex_lock after ext3_file_write(). Some of these are in congestion_wait() after balance_dirty_pages_ratelimited_nr() after ext3_file_write(). We could not enforce this long time deadlock by hand. But it's obvoiously the same (due to the call trace) because we can trigger a short-time with multible concurrent dd if=/dev/zero of=foo bs=4000k processes. I could only speculate if the non-terminating processes cause or tighten the problem, or if they're just the cause of process accounting (see below) which is also in wait state for writing the data to a file. Nevertheless the sysrq-trigger method allowed us to see what causes terminating processes to wait in their exit()-call: do_group_exit() calls do_exit() which calls acct_process(). acct_process() does a do_sync_write() which hangs in a mutex_lock. If we boot a machine, then enable process accounting (acct(2)) and then do the file-I/O tests mentioned above, we have the same effect of non-terminating processes, and the sysrq-trigger result corresponds. They terminate after some outstanding blocks from dd are written. If process accounting is off, the kernel does not call acct_process() (tested), which is expected. Ok, this explains the many non-terminating processes and the load. But it raises another question. We do not have and do not need process accounting and we do not even have installed the accton tools. Thus, why does the buggy machine calls acct_process() during the exit of processes? Unfortunately, the kernel has no fence (/sys would be nice) for looking if the process accounting is really on, and if, to which file it actually writes. For the next error, which we tensly await to happen, we are prepared to: - force process-accounting off with call acct(0) and examine the output of sysrq-trigger - install a patched kernel which gives us the opportunity to see if
Re: Strange system hangs
On Sat, 29 Sep 2007, Nick Piggin wrote: On Friday 28 September 2007 18:42, Krzysztof Oledzki wrote: Hello, I am experiencing weird system hangs. Once about 2-5 weeks system freezes and stops accepting remote connections, so it is no longer possible to connect to most important services: smtp (postfix), www (squid) or even ssh. Such connection is accepted but then it hangs. What is strange, that previously established ssh session is usable. It is possible to work on such system until you do something stupid like "less /var/log/all.log". Using strace I found that process blocks on: Is this a regression? If so, what's the most recent kernel that didn't show the problem? I don't know. First kernel I ran was 2.6.20.x. This is quite fresh system. The symptoms could be consistent with some place doing a balance_dirty_pages while holding a lock that is required for IO, but I can't see a smoking gun (you've got contention on i_mutex, but that should be OK). Can you see if there is any memory under writeback that isn't being completed (sysrq+M), also a list the locks held after the hang might be helpful (compile in lockdep and sysrq+D) OK. I'll try to do it next time if there will be a chance. It may take some time, BTW. Is anything currently running? (sysrq+P and even a full sysrq+T task list could be useful). I'll have to check - maybe I have this captured. If not I'll check it next time. Are any IO errors occurring at all? Didn't notice - so no. Thank you. Best regards, Krzysztof Olędzki
Re: Strange system hangs
On Friday 28 September 2007 18:42, Krzysztof Oledzki wrote: > Hello, > > I am experiencing weird system hangs. Once about 2-5 weeks system freezes > and stops accepting remote connections, so it is no longer possible to > connect to most important services: smtp (postfix), www (squid) or even > ssh. Such connection is accepted but then it hangs. > > What is strange, that previously established ssh session is usable. It is > possible to work on such system until you do something stupid like "less > /var/log/all.log". Using strace I found that process blocks on: Is this a regression? If so, what's the most recent kernel that didn't show the problem? The symptoms could be consistent with some place doing a balance_dirty_pages while holding a lock that is required for IO, but I can't see a smoking gun (you've got contention on i_mutex, but that should be OK). Can you see if there is any memory under writeback that isn't being completed (sysrq+M), also a list the locks held after the hang might be helpful (compile in lockdep and sysrq+D) Is anything currently running? (sysrq+P and even a full sysrq+T task list could be useful). Are any IO errors occurring at all? Thanks, Nick - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Strange system hangs
On Friday 28 September 2007 18:42, Krzysztof Oledzki wrote: Hello, I am experiencing weird system hangs. Once about 2-5 weeks system freezes and stops accepting remote connections, so it is no longer possible to connect to most important services: smtp (postfix), www (squid) or even ssh. Such connection is accepted but then it hangs. What is strange, that previously established ssh session is usable. It is possible to work on such system until you do something stupid like less /var/log/all.log. Using strace I found that process blocks on: Is this a regression? If so, what's the most recent kernel that didn't show the problem? The symptoms could be consistent with some place doing a balance_dirty_pages while holding a lock that is required for IO, but I can't see a smoking gun (you've got contention on i_mutex, but that should be OK). Can you see if there is any memory under writeback that isn't being completed (sysrq+M), also a list the locks held after the hang might be helpful (compile in lockdep and sysrq+D) Is anything currently running? (sysrq+P and even a full sysrq+T task list could be useful). Are any IO errors occurring at all? Thanks, Nick - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Strange system hangs
On Sat, 29 Sep 2007, Nick Piggin wrote: On Friday 28 September 2007 18:42, Krzysztof Oledzki wrote: Hello, I am experiencing weird system hangs. Once about 2-5 weeks system freezes and stops accepting remote connections, so it is no longer possible to connect to most important services: smtp (postfix), www (squid) or even ssh. Such connection is accepted but then it hangs. What is strange, that previously established ssh session is usable. It is possible to work on such system until you do something stupid like less /var/log/all.log. Using strace I found that process blocks on: Is this a regression? If so, what's the most recent kernel that didn't show the problem? I don't know. First kernel I ran was 2.6.20.x. This is quite fresh system. The symptoms could be consistent with some place doing a balance_dirty_pages while holding a lock that is required for IO, but I can't see a smoking gun (you've got contention on i_mutex, but that should be OK). Can you see if there is any memory under writeback that isn't being completed (sysrq+M), also a list the locks held after the hang might be helpful (compile in lockdep and sysrq+D) OK. I'll try to do it next time if there will be a chance. It may take some time, BTW. Is anything currently running? (sysrq+P and even a full sysrq+T task list could be useful). I'll have to check - maybe I have this captured. If not I'll check it next time. Are any IO errors occurring at all? Didn't notice - so no. Thank you. Best regards, Krzysztof Olędzki
Re: Strange system hangs
On Fri, 28 Sep 2007, Peter Zijlstra wrote: On Fri, 2007-09-28 at 10:42 +0200, Krzysztof Oledzki wrote: Hello, I am experiencing weird system hangs. Once about 2-5 weeks system freezes and stops accepting remote connections, so it is no longer possible to connect to most important services: smtp (postfix), www (squid) or even ssh. Such connection is accepted but then it hangs. What is strange, that previously established ssh session is usable. It is possible to work on such system until you do something stupid like "less /var/log/all.log". So it takes weeks to reproduce this? Unfortunately, yes. :( freesibling task PCstack pid father child younger older syslogd D F5C83C60 0 2162 1 (NOTLB) f5c83c74 0082 0002 f5c83c60 f5c83c5c 78538d20 0009 0001 f7f6a070 f7cb8030 82c47e5f 0001cfed 0a43 f7f6a17c 7a016980 f705dc80 78404217 7812c708 0213 f5c83c84 1e7a64bb Call Trace: [<78404217>] _spin_unlock_irqrestore+0xf/0x23 [<7812c708>] __mod_timer+0x92/0x9c [<78402b34>] schedule_timeout+0x70/0x8d [<7812c521>] process_timeout+0x0/0x5 [<78402548>] io_schedule_timeout+0x1e/0x28 [<7814d41e>] congestion_wait+0x50/0x64 [<78134abc>] autoremove_wake_function+0x0/0x35 [<781493e7>] balance_dirty_pages_ratelimited_nr+0x16e/0x1dc [<78145bd0>] generic_file_buffered_write+0x4ee/0x605 [<783c55a1>] unix_dgram_recvmsg+0x1b4/0x1c8 [<78128c8e>] current_fs_time+0x41/0x46 [<78146167>] __generic_file_aio_write_nolock+0x480/0x4df [<7814621b>] generic_file_aio_write+0x55/0xb3 [<78194b28>] ext3_file_write+0x24/0x8f [<7815f34f>] do_sync_readv_writev+0xc1/0xfe [<78134abc>] autoremove_wake_function+0x0/0x35 [<784041ae>] _spin_unlock+0xd/0x21 [<781a8c38>] log_wait_commit+0xc3/0xe3 [<7814448b>] find_get_pages_tag+0x76/0x80 [<7815f204>] rw_copy_check_uvector+0x50/0xaa [<7815f9d4>] do_readv_writev+0x99/0x164 [<78194b04>] ext3_file_write+0x0/0x8f [<7815fadc>] vfs_writev+0x3d/0x48 [<7815feb5>] sys_writev+0x41/0x67 [<78103d6a>] sysenter_past_esp+0x5f/0x85 === This trace puzzles me, what is: unix_dgram_recvmsg doing there. Also, it has two invocations of: ext3_file_write do you have a stacked filesystem of sorts, ext3 on loopback on ext3? No, no loopback: # mount /dev/md0 on / type ext3 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw,nosuid,nodev,noexec) devpts on /dev/pts type devpts (rw,nosuid,noexec) /dev/mapper/VolGrp0-usr on /usr type ext3 (rw,nodev,data=journal) /dev/mapper/VolGrp0-var on /var type ext3 (rw,nodev,data=journal) /dev/mapper/VolGrp0-squid_spool on /var/cache/squid/cd0 type ext3 (rw,nosuid,nodev,noatime,data=writeback) /dev/mapper/VolGrp0-squid_spool2 on /var/cache/squid/cd1 type ext3 (rw,nosuid,nodev,noatime,data=writeback) /dev/mapper/VolGrp0-news_spool on /var/spool/news type ext3 (rw,nosuid,nodev,noatime) shm on /dev/shm type tmpfs (rw,noexec,nosuid,nodev) usbfs on /proc/bus/usb type usbfs (rw,noexec,nosuid,devmode=0664,devgid=85) owl:/usr/gentoo-nfs on /usr/gentoo-nfs type nfs (ro,nosuid,nodev,noatime,bg,intr,tcp,addr=192.168.129.26) Nothing more. freshclam D 0282 0 2866 1 (NOTLB) f36e3cc4 0082 0009 0282 7a0173c0 0002 007b 0009 0001 f7cb8030 f7c72030 82c4884d 0001cfed 09ee f7cb813c 7a016980 f66c0b80 78404217 7812c708 0213 f36e3cd4 1e7a64bb Call Trace: [<78404217>] _spin_unlock_irqrestore+0xf/0x23 [<7812c708>] __mod_timer+0x92/0x9c [<78402b34>] schedule_timeout+0x70/0x8d [<7812c521>] process_timeout+0x0/0x5 [<78402548>] io_schedule_timeout+0x1e/0x28 [<7814d41e>] congestion_wait+0x50/0x64 [<78134abc>] autoremove_wake_function+0x0/0x35 [<781493e7>] balance_dirty_pages_ratelimited_nr+0x16e/0x1dc [<78145bd0>] generic_file_buffered_write+0x4ee/0x605 [<7819cdb4>] __ext3_journal_stop+0x19/0x34 [<7840408f>] _spin_lock+0xd/0x5a [<78176f3d>] __mark_inode_dirty+0xdd/0x16f [<78128c8e>] current_fs_time+0x41/0x46 [<78146167>] __generic_file_aio_write_nolock+0x480/0x4df [<7814621b>] generic_file_aio_write+0x55/0xb3 [<78103159>] setup_sigcontext+0x105/0x189 [<78194b28>] ext3_file_write+0x24/0x8f [<7815f453>] do_sync_write+0xc7/0x10a [<78134abc>] autoremove_wake_function+0x0/0x35 [<781085d2>] convert_fxsr_from_user+0x15/0xd5 [<7815f38c>] do_sync_write+0x0/0x10a [<7815fbb6>] vfs_write+0x8a/0x10c [<78160123>] sys_write+0x41/0x67 [<78103d6a>] sysenter_past_esp+0x5f/0x85 === single write, no networking, also stuck in balance_dirty_pages(). Exactly. Strange, isn't it? Thanks. Best regards, Krzysztof Olędzki
Re: Strange system hangs
On Fri, 2007-09-28 at 10:42 +0200, Krzysztof Oledzki wrote: > Hello, > > I am experiencing weird system hangs. Once about 2-5 weeks system freezes > and stops accepting remote connections, so it is no longer possible to > connect to most important services: smtp (postfix), www (squid) or even > ssh. Such connection is accepted but then it hangs. > > What is strange, that previously established ssh session is usable. It is > possible to work on such system until you do something stupid like "less > /var/log/all.log". So it takes weeks to reproduce this? > freesibling >task PCstack pid father child younger older > syslogd D F5C83C60 0 2162 1 (NOTLB) > f5c83c74 0082 0002 f5c83c60 f5c83c5c > 78538d20 > 0009 0001 f7f6a070 f7cb8030 82c47e5f 0001cfed 0a43 > f7f6a17c > 7a016980 f705dc80 78404217 7812c708 0213 f5c83c84 > 1e7a64bb > Call Trace: > [<78404217>] _spin_unlock_irqrestore+0xf/0x23 > [<7812c708>] __mod_timer+0x92/0x9c > [<78402b34>] schedule_timeout+0x70/0x8d > [<7812c521>] process_timeout+0x0/0x5 > [<78402548>] io_schedule_timeout+0x1e/0x28 > [<7814d41e>] congestion_wait+0x50/0x64 > [<78134abc>] autoremove_wake_function+0x0/0x35 > [<781493e7>] balance_dirty_pages_ratelimited_nr+0x16e/0x1dc > [<78145bd0>] generic_file_buffered_write+0x4ee/0x605 > [<783c55a1>] unix_dgram_recvmsg+0x1b4/0x1c8 > [<78128c8e>] current_fs_time+0x41/0x46 > [<78146167>] __generic_file_aio_write_nolock+0x480/0x4df > [<7814621b>] generic_file_aio_write+0x55/0xb3 > [<78194b28>] ext3_file_write+0x24/0x8f > [<7815f34f>] do_sync_readv_writev+0xc1/0xfe > [<78134abc>] autoremove_wake_function+0x0/0x35 > [<784041ae>] _spin_unlock+0xd/0x21 > [<781a8c38>] log_wait_commit+0xc3/0xe3 > [<7814448b>] find_get_pages_tag+0x76/0x80 > [<7815f204>] rw_copy_check_uvector+0x50/0xaa > [<7815f9d4>] do_readv_writev+0x99/0x164 > [<78194b04>] ext3_file_write+0x0/0x8f > [<7815fadc>] vfs_writev+0x3d/0x48 > [<7815feb5>] sys_writev+0x41/0x67 > [<78103d6a>] sysenter_past_esp+0x5f/0x85 > === This trace puzzles me, what is: unix_dgram_recvmsg doing there. Also, it has two invocations of: ext3_file_write do you have a stacked filesystem of sorts, ext3 on loopback on ext3? > freshclam D 0282 0 2866 1 (NOTLB) > f36e3cc4 0082 0009 0282 7a0173c0 0002 > 007b > 0009 0001 f7cb8030 f7c72030 82c4884d 0001cfed 09ee > f7cb813c > 7a016980 f66c0b80 78404217 7812c708 0213 f36e3cd4 > 1e7a64bb > Call Trace: > [<78404217>] _spin_unlock_irqrestore+0xf/0x23 > [<7812c708>] __mod_timer+0x92/0x9c > [<78402b34>] schedule_timeout+0x70/0x8d > [<7812c521>] process_timeout+0x0/0x5 > [<78402548>] io_schedule_timeout+0x1e/0x28 > [<7814d41e>] congestion_wait+0x50/0x64 > [<78134abc>] autoremove_wake_function+0x0/0x35 > [<781493e7>] balance_dirty_pages_ratelimited_nr+0x16e/0x1dc > [<78145bd0>] generic_file_buffered_write+0x4ee/0x605 > [<7819cdb4>] __ext3_journal_stop+0x19/0x34 > [<7840408f>] _spin_lock+0xd/0x5a > [<78176f3d>] __mark_inode_dirty+0xdd/0x16f > [<78128c8e>] current_fs_time+0x41/0x46 > [<78146167>] __generic_file_aio_write_nolock+0x480/0x4df > [<7814621b>] generic_file_aio_write+0x55/0xb3 > [<78103159>] setup_sigcontext+0x105/0x189 > [<78194b28>] ext3_file_write+0x24/0x8f > [<7815f453>] do_sync_write+0xc7/0x10a > [<78134abc>] autoremove_wake_function+0x0/0x35 > [<781085d2>] convert_fxsr_from_user+0x15/0xd5 > [<7815f38c>] do_sync_write+0x0/0x10a > [<7815fbb6>] vfs_write+0x8a/0x10c > [<78160123>] sys_write+0x41/0x67 > [<78103d6a>] sysenter_past_esp+0x5f/0x85 > === single write, no networking, also stuck in balance_dirty_pages(). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Strange system hangs
Hello, I am experiencing weird system hangs. Once about 2-5 weeks system freezes and stops accepting remote connections, so it is no longer possible to connect to most important services: smtp (postfix), www (squid) or even ssh. Such connection is accepted but then it hangs. What is strange, that previously established ssh session is usable. It is possible to work on such system until you do something stupid like "less /var/log/all.log". Using strace I found that process blocks on: --- strace: being --- execve("/usr/bin/tail", ["tail", "-f", "/var/log/all.log"], [/* 33 vars */]) = 0 brk(0) = 0x8052000 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x6ff0 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=20944, ...}) = 0 mmap2(NULL, 20944, PROT_READ, MAP_PRIVATE, 3, 0) = 0x6fefa000 close(3)= 0 open("/lib/libc.so.6", O_RDONLY)= 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0RY\1\0004\0\0\0"..., 512) = 512 fstat64(3, {st_mode=S_IFREG|0755, st_size=1175920, ...}) = 0 mmap2(NULL, 1185212, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x6fdd8000 mmap2(0x6fef4000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x11b) = 0x6fef4000 mmap2(0x6fef7000, 9660, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x6fef7000 close(3)= 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x6fdd7000 set_thread_area({entry_number:-1 -> 6, base_addr:0x6fdd76b0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0 mprotect(0x6fef4000, 4096, PROT_READ) = 0 mprotect(0x6ff1c000, 4096, PROT_READ) = 0 munmap(0x6fefa000, 20944) = 0 brk(0) = 0x8052000 brk(0x8073000) = 0x8073000 open("/var/log/all.log", O_RDONLY|O_LARGEFILE) = 3 fstat64(3, {st_mode=S_IFREG|0640, st_size=3171841, ...}) llseek(3, 0, --- strace: end --- This file is not very big: # ls -l /var/log/all.log -rw-r- 1 root root 3171841 Sep 27 04:36 /var/log/all.log Also running "dmesg > file" hangs, creating a file with only 4096 bytes. --- Show Blocked State: begin --- SysRq : Show Blocked State freesibling task PCstack pid father child younger older syslogd D F5C83C60 0 2162 1 (NOTLB) f5c83c74 0082 0002 f5c83c60 f5c83c5c 78538d20 0009 0001 f7f6a070 f7cb8030 82c47e5f 0001cfed 0a43 f7f6a17c 7a016980 f705dc80 78404217 7812c708 0213 f5c83c84 1e7a64bb Call Trace: [<78404217>] _spin_unlock_irqrestore+0xf/0x23 [<7812c708>] __mod_timer+0x92/0x9c [<78402b34>] schedule_timeout+0x70/0x8d [<7812c521>] process_timeout+0x0/0x5 [<78402548>] io_schedule_timeout+0x1e/0x28 [<7814d41e>] congestion_wait+0x50/0x64 [<78134abc>] autoremove_wake_function+0x0/0x35 [<781493e7>] balance_dirty_pages_ratelimited_nr+0x16e/0x1dc [<78145bd0>] generic_file_buffered_write+0x4ee/0x605 [<783c55a1>] unix_dgram_recvmsg+0x1b4/0x1c8 [<78128c8e>] current_fs_time+0x41/0x46 [<78146167>] __generic_file_aio_write_nolock+0x480/0x4df [<7814621b>] generic_file_aio_write+0x55/0xb3 [<78194b28>] ext3_file_write+0x24/0x8f [<7815f34f>] do_sync_readv_writev+0xc1/0xfe [<78134abc>] autoremove_wake_function+0x0/0x35 [<784041ae>] _spin_unlock+0xd/0x21 [<781a8c38>] log_wait_commit+0xc3/0xe3 [<7814448b>] find_get_pages_tag+0x76/0x80 [<7815f204>] rw_copy_check_uvector+0x50/0xaa [<7815f9d4>] do_readv_writev+0x99/0x164 [<78194b04>] ext3_file_write+0x0/0x8f [<7815fadc>] vfs_writev+0x3d/0x48 [<7815feb5>] sys_writev+0x41/0x67 [<78103d6a>] sysenter_past_esp+0x5f/0x85 === freshclam D 0282 0 2866 1 (NOTLB) f36e3cc4 0082 0009 0282 7a0173c0 0002 007b 0009 0001 f7cb8030 f7c72030 82c4884d 0001cfed 09ee f7cb813c 7a016980 f66c0b80 78404217 7812c708 0213 f36e3cd4 1e7a64bb Call Trace: [<78404217>] _spin_unlock_irqrestore+0xf/0x23 [<7812c708>] __mod_timer+0x92/0x9c [<78402b34>] schedule_timeout+0x70/0x8d [<7812c521>] process_timeout+0x0/0x5 [<78402548>] io_schedule_timeout+0x1e/0x28 [<7814d41e>] congestion_wait+0x50/0x64 [<78134abc>] autoremove_wake_function+0x0/0x35 [<781493e7>] balance_dirty_pages_ratelimited_nr+0x16e/0x1dc [<78145bd0>] generic_file_buffered_write+0x4ee/0x605 [<7819cdb4>] __ext3_journal_stop+0x19/0x34 [<7840408f>] _spin_lock+0xd/0x5a [<78176f3d>] __mark_inode_dirty+0xdd/0x16f [<78128c8e>] current_fs_time+0x41/0x46 [<78146167>] __generic_file_aio_write_nolock+0x480/0x4df [<7814621b>] generic_file_aio_write+0x55/0xb3 [<78103159>]
Re: Strange system hangs
On Fri, 2007-09-28 at 10:42 +0200, Krzysztof Oledzki wrote: Hello, I am experiencing weird system hangs. Once about 2-5 weeks system freezes and stops accepting remote connections, so it is no longer possible to connect to most important services: smtp (postfix), www (squid) or even ssh. Such connection is accepted but then it hangs. What is strange, that previously established ssh session is usable. It is possible to work on such system until you do something stupid like less /var/log/all.log. So it takes weeks to reproduce this? freesibling task PCstack pid father child younger older syslogd D F5C83C60 0 2162 1 (NOTLB) f5c83c74 0082 0002 f5c83c60 f5c83c5c 78538d20 0009 0001 f7f6a070 f7cb8030 82c47e5f 0001cfed 0a43 f7f6a17c 7a016980 f705dc80 78404217 7812c708 0213 f5c83c84 1e7a64bb Call Trace: [78404217] _spin_unlock_irqrestore+0xf/0x23 [7812c708] __mod_timer+0x92/0x9c [78402b34] schedule_timeout+0x70/0x8d [7812c521] process_timeout+0x0/0x5 [78402548] io_schedule_timeout+0x1e/0x28 [7814d41e] congestion_wait+0x50/0x64 [78134abc] autoremove_wake_function+0x0/0x35 [781493e7] balance_dirty_pages_ratelimited_nr+0x16e/0x1dc [78145bd0] generic_file_buffered_write+0x4ee/0x605 [783c55a1] unix_dgram_recvmsg+0x1b4/0x1c8 [78128c8e] current_fs_time+0x41/0x46 [78146167] __generic_file_aio_write_nolock+0x480/0x4df [7814621b] generic_file_aio_write+0x55/0xb3 [78194b28] ext3_file_write+0x24/0x8f [7815f34f] do_sync_readv_writev+0xc1/0xfe [78134abc] autoremove_wake_function+0x0/0x35 [784041ae] _spin_unlock+0xd/0x21 [781a8c38] log_wait_commit+0xc3/0xe3 [7814448b] find_get_pages_tag+0x76/0x80 [7815f204] rw_copy_check_uvector+0x50/0xaa [7815f9d4] do_readv_writev+0x99/0x164 [78194b04] ext3_file_write+0x0/0x8f [7815fadc] vfs_writev+0x3d/0x48 [7815feb5] sys_writev+0x41/0x67 [78103d6a] sysenter_past_esp+0x5f/0x85 === This trace puzzles me, what is: unix_dgram_recvmsg doing there. Also, it has two invocations of: ext3_file_write do you have a stacked filesystem of sorts, ext3 on loopback on ext3? freshclam D 0282 0 2866 1 (NOTLB) f36e3cc4 0082 0009 0282 7a0173c0 0002 007b 0009 0001 f7cb8030 f7c72030 82c4884d 0001cfed 09ee f7cb813c 7a016980 f66c0b80 78404217 7812c708 0213 f36e3cd4 1e7a64bb Call Trace: [78404217] _spin_unlock_irqrestore+0xf/0x23 [7812c708] __mod_timer+0x92/0x9c [78402b34] schedule_timeout+0x70/0x8d [7812c521] process_timeout+0x0/0x5 [78402548] io_schedule_timeout+0x1e/0x28 [7814d41e] congestion_wait+0x50/0x64 [78134abc] autoremove_wake_function+0x0/0x35 [781493e7] balance_dirty_pages_ratelimited_nr+0x16e/0x1dc [78145bd0] generic_file_buffered_write+0x4ee/0x605 [7819cdb4] __ext3_journal_stop+0x19/0x34 [7840408f] _spin_lock+0xd/0x5a [78176f3d] __mark_inode_dirty+0xdd/0x16f [78128c8e] current_fs_time+0x41/0x46 [78146167] __generic_file_aio_write_nolock+0x480/0x4df [7814621b] generic_file_aio_write+0x55/0xb3 [78103159] setup_sigcontext+0x105/0x189 [78194b28] ext3_file_write+0x24/0x8f [7815f453] do_sync_write+0xc7/0x10a [78134abc] autoremove_wake_function+0x0/0x35 [781085d2] convert_fxsr_from_user+0x15/0xd5 [7815f38c] do_sync_write+0x0/0x10a [7815fbb6] vfs_write+0x8a/0x10c [78160123] sys_write+0x41/0x67 [78103d6a] sysenter_past_esp+0x5f/0x85 === single write, no networking, also stuck in balance_dirty_pages(). - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Strange system hangs
Hello, I am experiencing weird system hangs. Once about 2-5 weeks system freezes and stops accepting remote connections, so it is no longer possible to connect to most important services: smtp (postfix), www (squid) or even ssh. Such connection is accepted but then it hangs. What is strange, that previously established ssh session is usable. It is possible to work on such system until you do something stupid like less /var/log/all.log. Using strace I found that process blocks on: --- strace: being --- execve(/usr/bin/tail, [tail, -f, /var/log/all.log], [/* 33 vars */]) = 0 brk(0) = 0x8052000 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x6ff0 access(/etc/ld.so.preload, R_OK) = -1 ENOENT (No such file or directory) open(/etc/ld.so.cache, O_RDONLY) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=20944, ...}) = 0 mmap2(NULL, 20944, PROT_READ, MAP_PRIVATE, 3, 0) = 0x6fefa000 close(3)= 0 open(/lib/libc.so.6, O_RDONLY)= 3 read(3, \177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0RY\1\0004\0\0\0..., 512) = 512 fstat64(3, {st_mode=S_IFREG|0755, st_size=1175920, ...}) = 0 mmap2(NULL, 1185212, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x6fdd8000 mmap2(0x6fef4000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x11b) = 0x6fef4000 mmap2(0x6fef7000, 9660, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x6fef7000 close(3)= 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x6fdd7000 set_thread_area({entry_number:-1 - 6, base_addr:0x6fdd76b0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0 mprotect(0x6fef4000, 4096, PROT_READ) = 0 mprotect(0x6ff1c000, 4096, PROT_READ) = 0 munmap(0x6fefa000, 20944) = 0 brk(0) = 0x8052000 brk(0x8073000) = 0x8073000 open(/var/log/all.log, O_RDONLY|O_LARGEFILE) = 3 fstat64(3, {st_mode=S_IFREG|0640, st_size=3171841, ...}) llseek(3, 0, unfinished ... --- strace: end --- This file is not very big: # ls -l /var/log/all.log -rw-r- 1 root root 3171841 Sep 27 04:36 /var/log/all.log Also running dmesg file hangs, creating a file with only 4096 bytes. --- Show Blocked State: begin --- SysRq : Show Blocked State freesibling task PCstack pid father child younger older syslogd D F5C83C60 0 2162 1 (NOTLB) f5c83c74 0082 0002 f5c83c60 f5c83c5c 78538d20 0009 0001 f7f6a070 f7cb8030 82c47e5f 0001cfed 0a43 f7f6a17c 7a016980 f705dc80 78404217 7812c708 0213 f5c83c84 1e7a64bb Call Trace: [78404217] _spin_unlock_irqrestore+0xf/0x23 [7812c708] __mod_timer+0x92/0x9c [78402b34] schedule_timeout+0x70/0x8d [7812c521] process_timeout+0x0/0x5 [78402548] io_schedule_timeout+0x1e/0x28 [7814d41e] congestion_wait+0x50/0x64 [78134abc] autoremove_wake_function+0x0/0x35 [781493e7] balance_dirty_pages_ratelimited_nr+0x16e/0x1dc [78145bd0] generic_file_buffered_write+0x4ee/0x605 [783c55a1] unix_dgram_recvmsg+0x1b4/0x1c8 [78128c8e] current_fs_time+0x41/0x46 [78146167] __generic_file_aio_write_nolock+0x480/0x4df [7814621b] generic_file_aio_write+0x55/0xb3 [78194b28] ext3_file_write+0x24/0x8f [7815f34f] do_sync_readv_writev+0xc1/0xfe [78134abc] autoremove_wake_function+0x0/0x35 [784041ae] _spin_unlock+0xd/0x21 [781a8c38] log_wait_commit+0xc3/0xe3 [7814448b] find_get_pages_tag+0x76/0x80 [7815f204] rw_copy_check_uvector+0x50/0xaa [7815f9d4] do_readv_writev+0x99/0x164 [78194b04] ext3_file_write+0x0/0x8f [7815fadc] vfs_writev+0x3d/0x48 [7815feb5] sys_writev+0x41/0x67 [78103d6a] sysenter_past_esp+0x5f/0x85 === freshclam D 0282 0 2866 1 (NOTLB) f36e3cc4 0082 0009 0282 7a0173c0 0002 007b 0009 0001 f7cb8030 f7c72030 82c4884d 0001cfed 09ee f7cb813c 7a016980 f66c0b80 78404217 7812c708 0213 f36e3cd4 1e7a64bb Call Trace: [78404217] _spin_unlock_irqrestore+0xf/0x23 [7812c708] __mod_timer+0x92/0x9c [78402b34] schedule_timeout+0x70/0x8d [7812c521] process_timeout+0x0/0x5 [78402548] io_schedule_timeout+0x1e/0x28 [7814d41e] congestion_wait+0x50/0x64 [78134abc] autoremove_wake_function+0x0/0x35 [781493e7] balance_dirty_pages_ratelimited_nr+0x16e/0x1dc [78145bd0] generic_file_buffered_write+0x4ee/0x605 [7819cdb4] __ext3_journal_stop+0x19/0x34 [7840408f] _spin_lock+0xd/0x5a [78176f3d] __mark_inode_dirty+0xdd/0x16f [78128c8e] current_fs_time+0x41/0x46 [78146167] __generic_file_aio_write_nolock+0x480/0x4df [7814621b] generic_file_aio_write+0x55/0xb3 [78103159] setup_sigcontext+0x105/0x189 [78194b28] ext3_file_write+0x24/0x8f [7815f453]
Re: Strange system hangs
On Fri, 28 Sep 2007, Peter Zijlstra wrote: On Fri, 2007-09-28 at 10:42 +0200, Krzysztof Oledzki wrote: Hello, I am experiencing weird system hangs. Once about 2-5 weeks system freezes and stops accepting remote connections, so it is no longer possible to connect to most important services: smtp (postfix), www (squid) or even ssh. Such connection is accepted but then it hangs. What is strange, that previously established ssh session is usable. It is possible to work on such system until you do something stupid like less /var/log/all.log. So it takes weeks to reproduce this? Unfortunately, yes. :( freesibling task PCstack pid father child younger older syslogd D F5C83C60 0 2162 1 (NOTLB) f5c83c74 0082 0002 f5c83c60 f5c83c5c 78538d20 0009 0001 f7f6a070 f7cb8030 82c47e5f 0001cfed 0a43 f7f6a17c 7a016980 f705dc80 78404217 7812c708 0213 f5c83c84 1e7a64bb Call Trace: [78404217] _spin_unlock_irqrestore+0xf/0x23 [7812c708] __mod_timer+0x92/0x9c [78402b34] schedule_timeout+0x70/0x8d [7812c521] process_timeout+0x0/0x5 [78402548] io_schedule_timeout+0x1e/0x28 [7814d41e] congestion_wait+0x50/0x64 [78134abc] autoremove_wake_function+0x0/0x35 [781493e7] balance_dirty_pages_ratelimited_nr+0x16e/0x1dc [78145bd0] generic_file_buffered_write+0x4ee/0x605 [783c55a1] unix_dgram_recvmsg+0x1b4/0x1c8 [78128c8e] current_fs_time+0x41/0x46 [78146167] __generic_file_aio_write_nolock+0x480/0x4df [7814621b] generic_file_aio_write+0x55/0xb3 [78194b28] ext3_file_write+0x24/0x8f [7815f34f] do_sync_readv_writev+0xc1/0xfe [78134abc] autoremove_wake_function+0x0/0x35 [784041ae] _spin_unlock+0xd/0x21 [781a8c38] log_wait_commit+0xc3/0xe3 [7814448b] find_get_pages_tag+0x76/0x80 [7815f204] rw_copy_check_uvector+0x50/0xaa [7815f9d4] do_readv_writev+0x99/0x164 [78194b04] ext3_file_write+0x0/0x8f [7815fadc] vfs_writev+0x3d/0x48 [7815feb5] sys_writev+0x41/0x67 [78103d6a] sysenter_past_esp+0x5f/0x85 === This trace puzzles me, what is: unix_dgram_recvmsg doing there. Also, it has two invocations of: ext3_file_write do you have a stacked filesystem of sorts, ext3 on loopback on ext3? No, no loopback: # mount /dev/md0 on / type ext3 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw,nosuid,nodev,noexec) devpts on /dev/pts type devpts (rw,nosuid,noexec) /dev/mapper/VolGrp0-usr on /usr type ext3 (rw,nodev,data=journal) /dev/mapper/VolGrp0-var on /var type ext3 (rw,nodev,data=journal) /dev/mapper/VolGrp0-squid_spool on /var/cache/squid/cd0 type ext3 (rw,nosuid,nodev,noatime,data=writeback) /dev/mapper/VolGrp0-squid_spool2 on /var/cache/squid/cd1 type ext3 (rw,nosuid,nodev,noatime,data=writeback) /dev/mapper/VolGrp0-news_spool on /var/spool/news type ext3 (rw,nosuid,nodev,noatime) shm on /dev/shm type tmpfs (rw,noexec,nosuid,nodev) usbfs on /proc/bus/usb type usbfs (rw,noexec,nosuid,devmode=0664,devgid=85) owl:/usr/gentoo-nfs on /usr/gentoo-nfs type nfs (ro,nosuid,nodev,noatime,bg,intr,tcp,addr=192.168.129.26) Nothing more. freshclam D 0282 0 2866 1 (NOTLB) f36e3cc4 0082 0009 0282 7a0173c0 0002 007b 0009 0001 f7cb8030 f7c72030 82c4884d 0001cfed 09ee f7cb813c 7a016980 f66c0b80 78404217 7812c708 0213 f36e3cd4 1e7a64bb Call Trace: [78404217] _spin_unlock_irqrestore+0xf/0x23 [7812c708] __mod_timer+0x92/0x9c [78402b34] schedule_timeout+0x70/0x8d [7812c521] process_timeout+0x0/0x5 [78402548] io_schedule_timeout+0x1e/0x28 [7814d41e] congestion_wait+0x50/0x64 [78134abc] autoremove_wake_function+0x0/0x35 [781493e7] balance_dirty_pages_ratelimited_nr+0x16e/0x1dc [78145bd0] generic_file_buffered_write+0x4ee/0x605 [7819cdb4] __ext3_journal_stop+0x19/0x34 [7840408f] _spin_lock+0xd/0x5a [78176f3d] __mark_inode_dirty+0xdd/0x16f [78128c8e] current_fs_time+0x41/0x46 [78146167] __generic_file_aio_write_nolock+0x480/0x4df [7814621b] generic_file_aio_write+0x55/0xb3 [78103159] setup_sigcontext+0x105/0x189 [78194b28] ext3_file_write+0x24/0x8f [7815f453] do_sync_write+0xc7/0x10a [78134abc] autoremove_wake_function+0x0/0x35 [781085d2] convert_fxsr_from_user+0x15/0xd5 [7815f38c] do_sync_write+0x0/0x10a [7815fbb6] vfs_write+0x8a/0x10c [78160123] sys_write+0x41/0x67 [78103d6a] sysenter_past_esp+0x5f/0x85 === single write, no networking, also stuck in balance_dirty_pages(). Exactly. Strange, isn't it? Thanks. Best regards, Krzysztof Olędzki