Re: Soft lockup when running large iterations of gtod
Hi Luis, On Mon, Feb 11, 2008 at 05:52:41PM -0200, Luis Claudio R. Goncalves wrote: On Mon, Feb 11, 2008 at 08:56:51PM +0530, Ankita Garg wrote: | Hi, | | On Thu, Feb 07, 2008 at 04:14:09PM +0530, Ankita Garg wrote: | Hi, | | I observed a softlockup when running large number of iterations of the | gtod_latency testcase. This testcase measure the time between several | pairs of calls to gettimeofday(). | | Kernel: 2.6.24-rt1 | This was on AMD Opteron dual dual-core box. | | The test output is : | | -- | Gettimeofday() Latency | -- | Iterations: 1 | | | I can reproduce the issue even with smaller number of iterations of the | gtod_latency test case. I have observed the same symptoms in my system but realized that: - System is a Core2Duo (2 CPUs, 1 proc) 2GB RAM - The test runs at rtprio 99 and SCHED_FIFO - It writes a huge amount of data when # of iterations = 1 - It uses 1.5GB+ of RAM for the above # of iterations - The disk IRQ runs at rtrpio 95 (in my system) and could easily starve - All my backtraces involve ext3 handling functions Thanks for your response. I have tried this on boxes with 4 8 processors with 8GB RAM. I could reproduce the issue even with smaller number of iterations of the test (100). That said, the traces do indicate ext3 code being executed, so the likelyhood of IRQ starvation is high. Will try by reducing the priority. But, iirc, I did not hit this on 2.6.21.4-rt10 kernel! In short, with no memory and lots of data to write, disk IRQ starve. With the attached patch, that changes schedule to SCHED_OTHER, I am able to run the test with no errors. Btw, after running the test, I have these files written to disk: -rw-rw-r-- 1 lclaudio lclaudio 139 2008-02-11 17:31 hist.dat -rw-rw-r-- 1 lclaudio lclaudio 174 2008-02-11 17:31 hist.plt -rw-rw-r-- 1 lclaudio lclaudio 1.1G 2008-02-11 17:31 scatter.dat -rw-rw-r-- 1 lclaudio lclaudio 191 2008-02-11 17:31 scatter.plt Regards, Luis | Min: 1 us | Max: 670971 us | Avg: 0.4021 us | StdDev: 164.3269 us | | Soft lockup | | BUG: soft lockup - CPU#2 stuck for 11s! [gtod_latency:18573] | CPU 2: | Modules linked in: dock rtc_cmos rtc_core rtc_lib bnx2 shpchp serio_raw | lp | parport_pc parport ac battery button i2c_core sbs sbshc dm_multipath | dm_mirror | dm_mod sunrpc rfcomm hidp l2cap bluetooth autofs4 ipv6 sg pcspkr k8temp | hwmon | mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd | ehci_hcd | ohci_hcd uhci_hcd | Pid: 18573, comm: gtod_latency Not tainted 2.6.24-rt1-autotest #1 | RIP: 0033:[ff600188] [ff600188] vread_hpet+0x8/0xf | RSP: 002b:7fffce199860 EFLAGS: 0282 | RAX: eaef6171 RBX: 7fffce199890 RCX: 0002 | RDX: 041291f4 RSI: RDI: 7fffce199910 | RBP: R08: 1175d794 R09: 0016 | R10: 1181 R11: 003b0b487160 R12: | R13: R14: 7fffce199900 R15: 7fffce199910 | FS: 2b7ddc9266b0() GS:81022fc057c0() | knlGS:f7f9e6c0 | CS: 0010 DS: ES: CR0: 8005003b | CR2: 003b0b487160 CR3: 00022c4b7000 CR4: 06e0 | DR0: DR1: DR2: | DR3: DR6: 0ff0 DR7: 0400 | | | BUG: soft lockup - CPU#2 stuck for 11s! [gtod_latency:18573] | CPU 2: | Modules linked in: dock rtc_cmos rtc_core rtc_lib bnx2 shpchp serio_raw | lp | parport_pc parport ac battery button i2c_core sbs sbshc dm_multipath | dm_mirror | dm_mod sunrpc rfcomm hidp l2cap bluetooth autofs4 ipv6 sg pcspkr k8temp | hwmon | mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd | ehci_hcd | ohci_hcd uhci_hcd | Pid: 18573, comm: gtod_latency Not tainted 2.6.24-rt1-autotest #1 | RIP: 0010:[8803014f] [8803014f] | :ext3:ext3_writepage_trans_blocks+0x4b/0x9b | RSP: 0018:810103a8bae8 EFLAGS: 0246 | RAX: 00010820 RBX: 810103a8baf8 RCX: 0001 | RDX: 0003 RSI: 810105240358 RDI: 8101052401c0 | RBP: 80636140 R08: R09: 810103a8bc60 | R10: 810103a8bed8 R11: 0001 R12: 8067ffe0 | R13: 81022df05000 R14: 810182dc62b8 R15: 81022df05000 | FS: 2b7ddc9266b0() GS:81022fc057c0() | knlGS:f7f9e6c0 | CS: 0010 DS: ES: CR0: 80050033 | CR2: 2b7e3bf0a000 CR3: 00022c4b7000 CR4: 06e0 | DR0: DR1: DR2: | DR3: DR6: 0ff0 DR7: 0400 | | Call Trace: | [88030115] :ext3:ext3_writepage_trans_blocks+0x11/0x9b | [88032178]
Re: Soft lockup when running large iterations of gtod
On Mon, Feb 11, 2008 at 08:56:51PM +0530, Ankita Garg wrote: | Hi, | | On Thu, Feb 07, 2008 at 04:14:09PM +0530, Ankita Garg wrote: | Hi, | | I observed a softlockup when running large number of iterations of the | gtod_latency testcase. This testcase measure the time between several | pairs of calls to gettimeofday(). | | Kernel: 2.6.24-rt1 | This was on AMD Opteron dual dual-core box. | | The test output is : | | -- | Gettimeofday() Latency | -- | Iterations: 1 | | | I can reproduce the issue even with smaller number of iterations of the | gtod_latency test case. I have observed the same symptoms in my system but realized that: - System is a Core2Duo (2 CPUs, 1 proc) 2GB RAM - The test runs at rtprio 99 and SCHED_FIFO - It writes a huge amount of data when # of iterations = 1 - It uses 1.5GB+ of RAM for the above # of iterations - The disk IRQ runs at rtrpio 95 (in my system) and could easily starve - All my backtraces involve ext3 handling functions In short, with no memory and lots of data to write, disk IRQ starve. With the attached patch, that changes schedule to SCHED_OTHER, I am able to run the test with no errors. Btw, after running the test, I have these files written to disk: -rw-rw-r-- 1 lclaudio lclaudio 139 2008-02-11 17:31 hist.dat -rw-rw-r-- 1 lclaudio lclaudio 174 2008-02-11 17:31 hist.plt -rw-rw-r-- 1 lclaudio lclaudio 1.1G 2008-02-11 17:31 scatter.dat -rw-rw-r-- 1 lclaudio lclaudio 191 2008-02-11 17:31 scatter.plt Regards, Luis | Min: 1 us | Max: 670971 us | Avg: 0.4021 us | StdDev: 164.3269 us | | Soft lockup | | BUG: soft lockup - CPU#2 stuck for 11s! [gtod_latency:18573] | CPU 2: | Modules linked in: dock rtc_cmos rtc_core rtc_lib bnx2 shpchp serio_raw | lp | parport_pc parport ac battery button i2c_core sbs sbshc dm_multipath | dm_mirror | dm_mod sunrpc rfcomm hidp l2cap bluetooth autofs4 ipv6 sg pcspkr k8temp | hwmon | mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd | ehci_hcd | ohci_hcd uhci_hcd | Pid: 18573, comm: gtod_latency Not tainted 2.6.24-rt1-autotest #1 | RIP: 0033:[ff600188] [ff600188] vread_hpet+0x8/0xf | RSP: 002b:7fffce199860 EFLAGS: 0282 | RAX: eaef6171 RBX: 7fffce199890 RCX: 0002 | RDX: 041291f4 RSI: RDI: 7fffce199910 | RBP: R08: 1175d794 R09: 0016 | R10: 1181 R11: 003b0b487160 R12: | R13: R14: 7fffce199900 R15: 7fffce199910 | FS: 2b7ddc9266b0() GS:81022fc057c0() | knlGS:f7f9e6c0 | CS: 0010 DS: ES: CR0: 8005003b | CR2: 003b0b487160 CR3: 00022c4b7000 CR4: 06e0 | DR0: DR1: DR2: | DR3: DR6: 0ff0 DR7: 0400 | | | BUG: soft lockup - CPU#2 stuck for 11s! [gtod_latency:18573] | CPU 2: | Modules linked in: dock rtc_cmos rtc_core rtc_lib bnx2 shpchp serio_raw | lp | parport_pc parport ac battery button i2c_core sbs sbshc dm_multipath | dm_mirror | dm_mod sunrpc rfcomm hidp l2cap bluetooth autofs4 ipv6 sg pcspkr k8temp | hwmon | mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd | ehci_hcd | ohci_hcd uhci_hcd | Pid: 18573, comm: gtod_latency Not tainted 2.6.24-rt1-autotest #1 | RIP: 0010:[8803014f] [8803014f] | :ext3:ext3_writepage_trans_blocks+0x4b/0x9b | RSP: 0018:810103a8bae8 EFLAGS: 0246 | RAX: 00010820 RBX: 810103a8baf8 RCX: 0001 | RDX: 0003 RSI: 810105240358 RDI: 8101052401c0 | RBP: 80636140 R08: R09: 810103a8bc60 | R10: 810103a8bed8 R11: 0001 R12: 8067ffe0 | R13: 81022df05000 R14: 810182dc62b8 R15: 81022df05000 | FS: 2b7ddc9266b0() GS:81022fc057c0() | knlGS:f7f9e6c0 | CS: 0010 DS: ES: CR0: 80050033 | CR2: 2b7e3bf0a000 CR3: 00022c4b7000 CR4: 06e0 | DR0: DR1: DR2: | DR3: DR6: 0ff0 DR7: 0400 | | Call Trace: | [88030115] :ext3:ext3_writepage_trans_blocks+0x11/0x9b | [88032178] :ext3:ext3_write_begin+0x32/0x19e | [8027370e] generic_file_buffered_write+0x151/0x611 | [8023d245] current_fs_time+0x22/0x29 | [802ab1c2] file_update_time+0x30/0xad | [80273f27] __generic_file_aio_write_nolock+0x359/0x3c3 | [802922ae] kmem_cache_free+0x63/0x70 | [80273ff5] generic_file_aio_write+0x64/0xc0 | [8802e2ae] :ext3:ext3_file_write+0x1e/0x9e | [80296b1f] do_sync_write+0xe2/0x126 | [80288736] page_add_new_anon_rmap+0x20/0x22 | [8024c860] autoremove_wake_function+0x0/0x38 |
Soft lockup when running large iterations of gtod
Hi, I observed a softlockup when running large number of iterations of the gtod_latency testcase. This testcase measure the time between several pairs of calls to gettimeofday(). Kernel: 2.6.24-rt1 This was on AMD Opteron dual dual-core box. The test output is : -- Gettimeofday() Latency -- Iterations: 1 Min: 1 us Max: 670971 us Avg: 0.4021 us StdDev: 164.3269 us Soft lockup BUG: soft lockup - CPU#2 stuck for 11s! [gtod_latency:18573] CPU 2: Modules linked in: dock rtc_cmos rtc_core rtc_lib bnx2 shpchp serio_raw lp parport_pc parport ac battery button i2c_core sbs sbshc dm_multipath dm_mirror dm_mod sunrpc rfcomm hidp l2cap bluetooth autofs4 ipv6 sg pcspkr k8temp hwmon mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd Pid: 18573, comm: gtod_latency Not tainted 2.6.24-rt1-autotest #1 RIP: 0033:[ff600188] [ff600188] vread_hpet+0x8/0xf RSP: 002b:7fffce199860 EFLAGS: 0282 RAX: eaef6171 RBX: 7fffce199890 RCX: 0002 RDX: 041291f4 RSI: RDI: 7fffce199910 RBP: R08: 1175d794 R09: 0016 R10: 1181 R11: 003b0b487160 R12: R13: R14: 7fffce199900 R15: 7fffce199910 FS: 2b7ddc9266b0() GS:81022fc057c0() knlGS:f7f9e6c0 CS: 0010 DS: ES: CR0: 8005003b CR2: 003b0b487160 CR3: 00022c4b7000 CR4: 06e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 BUG: soft lockup - CPU#2 stuck for 11s! [gtod_latency:18573] CPU 2: Modules linked in: dock rtc_cmos rtc_core rtc_lib bnx2 shpchp serio_raw lp parport_pc parport ac battery button i2c_core sbs sbshc dm_multipath dm_mirror dm_mod sunrpc rfcomm hidp l2cap bluetooth autofs4 ipv6 sg pcspkr k8temp hwmon mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd Pid: 18573, comm: gtod_latency Not tainted 2.6.24-rt1-autotest #1 RIP: 0010:[8803014f] [8803014f] :ext3:ext3_writepage_trans_blocks+0x4b/0x9b RSP: 0018:810103a8bae8 EFLAGS: 0246 RAX: 00010820 RBX: 810103a8baf8 RCX: 0001 RDX: 0003 RSI: 810105240358 RDI: 8101052401c0 RBP: 80636140 R08: R09: 810103a8bc60 R10: 810103a8bed8 R11: 0001 R12: 8067ffe0 R13: 81022df05000 R14: 810182dc62b8 R15: 81022df05000 FS: 2b7ddc9266b0() GS:81022fc057c0() knlGS:f7f9e6c0 CS: 0010 DS: ES: CR0: 80050033 CR2: 2b7e3bf0a000 CR3: 00022c4b7000 CR4: 06e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Call Trace: [88030115] :ext3:ext3_writepage_trans_blocks+0x11/0x9b [88032178] :ext3:ext3_write_begin+0x32/0x19e [8027370e] generic_file_buffered_write+0x151/0x611 [8023d245] current_fs_time+0x22/0x29 [802ab1c2] file_update_time+0x30/0xad [80273f27] __generic_file_aio_write_nolock+0x359/0x3c3 [802922ae] kmem_cache_free+0x63/0x70 [80273ff5] generic_file_aio_write+0x64/0xc0 [8802e2ae] :ext3:ext3_file_write+0x1e/0x9e [80296b1f] do_sync_write+0xe2/0x126 [80288736] page_add_new_anon_rmap+0x20/0x22 [8024c860] autoremove_wake_function+0x0/0x38 [80267bf1] audit_syscall_entry+0x148/0x17e [8029731b] vfs_write+0xc7/0x150 [802978c6] sys_write+0x4a/0x76 [8020c30e] tracesys+0xdc/0xe1 The last working kernel I had tried this on was 2.6.21.4-rt10. I notice that the vsyscall support for hpet went in sometime around 2.6.22. Any thoughts on what could be going wrong here? -- Regards, Ankita Garg ([EMAIL PROTECTED]) Linux Technology Center IBM India Systems Technology Labs, Bangalore, India - To unsubscribe from this list: send the line unsubscribe linux-rt-users in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html