On Mon, Feb 11, 2008 at 08:56:51PM +0530, Ankita Garg wrote:
| Hi,
| 
| On Thu, Feb 07, 2008 at 04:14:09PM +0530, Ankita Garg wrote:
| > Hi,
| > 
| > I observed a softlockup when running large number of iterations of the
| > gtod_latency testcase. This testcase measure the time between several
| > pairs of calls to gettimeofday(). 
| > 
| > Kernel: 2.6.24-rt1
| > This was on AMD Opteron dual dual-core box.
| > 
| > The test output is :
| > 
| > ----------------------
| > Gettimeofday() Latency
| > ----------------------
| > Iterations: 100000000
| >
| 
| I can reproduce the issue even with smaller number of iterations of the
| gtod_latency test case.

I have observed the same symptoms in my system but realized that:

- System is a Core2Duo (2 CPUs, 1 proc) 2GB RAM 
- The test runs at rtprio 99 and SCHED_FIFO
- It writes a huge amount of data when # of iterations = 100000000
- It uses 1.5GB+ of RAM for the above # of iterations
- The disk IRQ runs at rtrpio 95 (in my system) and could easily starve
- All my backtraces involve ext3 handling functions

In short, with no memory and lots of data to write, disk IRQ starve. With
the attached patch, that changes schedule to SCHED_OTHER, I am able to run
the test with no errors.

Btw, after running the test, I have these files written to disk:

-rw-rw-r-- 1 lclaudio lclaudio  139 2008-02-11 17:31 hist.dat
-rw-rw-r-- 1 lclaudio lclaudio  174 2008-02-11 17:31 hist.plt
-rw-rw-r-- 1 lclaudio lclaudio 1.1G 2008-02-11 17:31 scatter.dat
-rw-rw-r-- 1 lclaudio lclaudio  191 2008-02-11 17:31 scatter.plt

Regards,
Luis
 
| > Min: 1 us
| > Max: 670971 us
| > Avg: 0.4021 us
| > StdDev: 164.3269 us
| > 
| > Soft lockup
| > 
| > BUG: soft lockup - CPU#2 stuck for 11s! [gtod_latency:18573]
| > CPU 2:
| > Modules linked in: dock rtc_cmos rtc_core rtc_lib bnx2 shpchp serio_raw
| > lp
| > parport_pc parport ac battery button i2c_core sbs sbshc dm_multipath
| > dm_mirror
| > dm_mod sunrpc rfcomm hidp l2cap bluetooth autofs4 ipv6 sg pcspkr k8temp
| > hwmon
| > mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd
| > ehci_hcd
| > ohci_hcd uhci_hcd
| > Pid: 18573, comm: gtod_latency Not tainted 2.6.24-rt1-autotest #1
| > RIP: 0033:[<ffffffffff600188>]  [<ffffffffff600188>] vread_hpet+0x8/0xf
| > RSP: 002b:00007fffce199860  EFLAGS: 00000282
| > RAX: 00000000eaef6171 RBX: 00007fffce199890 RCX: 0000000000000002
| > RDX: 00000000041291f4 RSI: 0000000000000000 RDI: 00007fffce199910
| > RBP: 0000000000000000 R08: 000000001175d794 R09: 0000000000000016
| > R10: 0000000000001181 R11: 0000003b0b487160 R12: 0000000000000000
| > R13: 0000000000000000 R14: 00007fffce199900 R15: 00007fffce199910
| > FS:  00002b7ddc9266b0(0000) GS:ffff81022fc057c0(0000)
| > knlGS:00000000f7f9e6c0
| > CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
| > CR2: 0000003b0b487160 CR3: 000000022c4b7000 CR4: 00000000000006e0
| > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
| > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
| > 
| > 
| > BUG: soft lockup - CPU#2 stuck for 11s! [gtod_latency:18573]
| > CPU 2:
| > Modules linked in: dock rtc_cmos rtc_core rtc_lib bnx2 shpchp serio_raw
| > lp
| > parport_pc parport ac battery button i2c_core sbs sbshc dm_multipath
| > dm_mirror
| > dm_mod sunrpc rfcomm hidp l2cap bluetooth autofs4 ipv6 sg pcspkr k8temp
| > hwmon
| > mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd
| > ehci_hcd
| > ohci_hcd uhci_hcd
| > Pid: 18573, comm: gtod_latency Not tainted 2.6.24-rt1-autotest #1
| > RIP: 0010:[<ffffffff8803014f>]  [<ffffffff8803014f>]
| > :ext3:ext3_writepage_trans_blocks+0x4b/0x9b
| > RSP: 0018:ffff810103a8bae8  EFLAGS: 00000246
| > RAX: 0000000000010820 RBX: ffff810103a8baf8 RCX: 0000000000000001
| > RDX: 0000000000000003 RSI: ffff810105240358 RDI: ffff8101052401c0
| > RBP: ffffffff80636140 R08: 0000000000000000 R09: ffff810103a8bc60
| > R10: ffff810103a8bed8 R11: 0000000000000001 R12: ffffffff8067ffe0
| > R13: ffff81022df05000 R14: ffff810182dc62b8 R15: ffff81022df05000
| > FS:  00002b7ddc9266b0(0000) GS:ffff81022fc057c0(0000)
| > knlGS:00000000f7f9e6c0
| > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
| > CR2: 00002b7e3bf0a000 CR3: 000000022c4b7000 CR4: 00000000000006e0
| > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
| > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
| > 
| > Call Trace:
| >  [<ffffffff88030115>] :ext3:ext3_writepage_trans_blocks+0x11/0x9b
| >  [<ffffffff88032178>] :ext3:ext3_write_begin+0x32/0x19e
| >  [<ffffffff8027370e>] generic_file_buffered_write+0x151/0x611
| >  [<ffffffff8023d245>] current_fs_time+0x22/0x29
| >  [<ffffffff802ab1c2>] file_update_time+0x30/0xad
| >  [<ffffffff80273f27>] __generic_file_aio_write_nolock+0x359/0x3c3
| >  [<ffffffff802922ae>] kmem_cache_free+0x63/0x70
| >  [<ffffffff80273ff5>] generic_file_aio_write+0x64/0xc0
| >  [<ffffffff8802e2ae>] :ext3:ext3_file_write+0x1e/0x9e
| >  [<ffffffff80296b1f>] do_sync_write+0xe2/0x126
| >  [<ffffffff80288736>] page_add_new_anon_rmap+0x20/0x22
| >  [<ffffffff8024c860>] autoremove_wake_function+0x0/0x38
| >  [<ffffffff80267bf1>] audit_syscall_entry+0x148/0x17e
| >  [<ffffffff8029731b>] vfs_write+0xc7/0x150
| >  [<ffffffff802978c6>] sys_write+0x4a/0x76
| >  [<ffffffff8020c30e>] tracesys+0xdc/0xe1
| > 
| > The last working kernel I had tried this on was 2.6.21.4-rt10. I notice 
that 
| > the vsyscall support for hpet went in sometime around 2.6.22. Any
| > thoughts on what could be going wrong here?
| > 
| 
| -- 
| Regards,
| Ankita Garg ([EMAIL PROTECTED])
| Linux Technology Center
| IBM India Systems & Technology Labs, 
| Bangalore, India   
| -
| To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
| the body of a message to [EMAIL PROTECTED]
| More majordomo info at  http://vger.kernel.org/majordomo-info.html
---end quoted text---

-- 
[ Luis Claudio R. Goncalves                    Bass - Gospel - RT ]
[ Fingerprint: 4FDD B8C4 3C59 34BD 8BE9  2696 7203 D980 A448 C8F8 ]

diff --git a/gtod_latency.c b/gtod_latency.c
index a58bc88..aa4ce67 100644
--- a/gtod_latency.c
+++ b/gtod_latency.c
@@ -219,8 +219,8 @@ int main(int argc, char *argv[])
        }
                
        /* switch to SCHED_FIFO 99 */
-       param.sched_priority = sched_get_priority_max(SCHED_FIFO);
-       err = sched_setscheduler(0, SCHED_FIFO, &param);
+       param.sched_priority = sched_get_priority_max(SCHED_OTHER);
+       err = sched_setscheduler(0, SCHED_OTHER, &param);
 
        /* Check that the user has the appropriate privileges */
        if (err) {

Reply via email to