Re: Soft lockup when running large iterations of gtod

2008-02-12 Thread Ankita Garg
Hi Luis,

On Mon, Feb 11, 2008 at 05:52:41PM -0200, Luis Claudio R. Goncalves wrote:
 On Mon, Feb 11, 2008 at 08:56:51PM +0530, Ankita Garg wrote:
 | Hi,
 | 
 | On Thu, Feb 07, 2008 at 04:14:09PM +0530, Ankita Garg wrote:
 |  Hi,
 |  
 |  I observed a softlockup when running large number of iterations of the
 |  gtod_latency testcase. This testcase measure the time between several
 |  pairs of calls to gettimeofday(). 
 |  
 |  Kernel: 2.6.24-rt1
 |  This was on AMD Opteron dual dual-core box.
 |  
 |  The test output is :
 |  
 |  --
 |  Gettimeofday() Latency
 |  --
 |  Iterations: 1
 | 
 | 
 | I can reproduce the issue even with smaller number of iterations of the
 | gtod_latency test case.
 
 I have observed the same symptoms in my system but realized that:
 
 - System is a Core2Duo (2 CPUs, 1 proc) 2GB RAM 
 - The test runs at rtprio 99 and SCHED_FIFO
 - It writes a huge amount of data when # of iterations = 1
 - It uses 1.5GB+ of RAM for the above # of iterations
 - The disk IRQ runs at rtrpio 95 (in my system) and could easily starve
 - All my backtraces involve ext3 handling functions


Thanks for your response. I have tried this on boxes with 4  8 processors with 
8GB RAM. I could reproduce the issue even with smaller number of iterations of
 the test (100). That said, the traces do indicate ext3 code being executed,
so the likelyhood of IRQ starvation is high. Will try by reducing the priority.

But, iirc, I did not hit this on 2.6.21.4-rt10 kernel!
 
 In short, with no memory and lots of data to write, disk IRQ starve. With
 the attached patch, that changes schedule to SCHED_OTHER, I am able to run
 the test with no errors.
 
 Btw, after running the test, I have these files written to disk:
 
 -rw-rw-r-- 1 lclaudio lclaudio  139 2008-02-11 17:31 hist.dat
 -rw-rw-r-- 1 lclaudio lclaudio  174 2008-02-11 17:31 hist.plt
 -rw-rw-r-- 1 lclaudio lclaudio 1.1G 2008-02-11 17:31 scatter.dat
 -rw-rw-r-- 1 lclaudio lclaudio  191 2008-02-11 17:31 scatter.plt
 
 Regards,
 Luis
 
 |  Min: 1 us
 |  Max: 670971 us
 |  Avg: 0.4021 us
 |  StdDev: 164.3269 us
 |  
 |  Soft lockup
 |  
 |  BUG: soft lockup - CPU#2 stuck for 11s! [gtod_latency:18573]
 |  CPU 2:
 |  Modules linked in: dock rtc_cmos rtc_core rtc_lib bnx2 shpchp serio_raw
 |  lp
 |  parport_pc parport ac battery button i2c_core sbs sbshc dm_multipath
 |  dm_mirror
 |  dm_mod sunrpc rfcomm hidp l2cap bluetooth autofs4 ipv6 sg pcspkr k8temp
 |  hwmon
 |  mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd
 |  ehci_hcd
 |  ohci_hcd uhci_hcd
 |  Pid: 18573, comm: gtod_latency Not tainted 2.6.24-rt1-autotest #1
 |  RIP: 0033:[ff600188]  [ff600188] vread_hpet+0x8/0xf
 |  RSP: 002b:7fffce199860  EFLAGS: 0282
 |  RAX: eaef6171 RBX: 7fffce199890 RCX: 0002
 |  RDX: 041291f4 RSI:  RDI: 7fffce199910
 |  RBP:  R08: 1175d794 R09: 0016
 |  R10: 1181 R11: 003b0b487160 R12: 
 |  R13:  R14: 7fffce199900 R15: 7fffce199910
 |  FS:  2b7ddc9266b0() GS:81022fc057c0()
 |  knlGS:f7f9e6c0
 |  CS:  0010 DS:  ES:  CR0: 8005003b
 |  CR2: 003b0b487160 CR3: 00022c4b7000 CR4: 06e0
 |  DR0:  DR1:  DR2: 
 |  DR3:  DR6: 0ff0 DR7: 0400
 |  
 |  
 |  BUG: soft lockup - CPU#2 stuck for 11s! [gtod_latency:18573]
 |  CPU 2:
 |  Modules linked in: dock rtc_cmos rtc_core rtc_lib bnx2 shpchp serio_raw
 |  lp
 |  parport_pc parport ac battery button i2c_core sbs sbshc dm_multipath
 |  dm_mirror
 |  dm_mod sunrpc rfcomm hidp l2cap bluetooth autofs4 ipv6 sg pcspkr k8temp
 |  hwmon
 |  mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd
 |  ehci_hcd
 |  ohci_hcd uhci_hcd
 |  Pid: 18573, comm: gtod_latency Not tainted 2.6.24-rt1-autotest #1
 |  RIP: 0010:[8803014f]  [8803014f]
 |  :ext3:ext3_writepage_trans_blocks+0x4b/0x9b
 |  RSP: 0018:810103a8bae8  EFLAGS: 0246
 |  RAX: 00010820 RBX: 810103a8baf8 RCX: 0001
 |  RDX: 0003 RSI: 810105240358 RDI: 8101052401c0
 |  RBP: 80636140 R08:  R09: 810103a8bc60
 |  R10: 810103a8bed8 R11: 0001 R12: 8067ffe0
 |  R13: 81022df05000 R14: 810182dc62b8 R15: 81022df05000
 |  FS:  2b7ddc9266b0() GS:81022fc057c0()
 |  knlGS:f7f9e6c0
 |  CS:  0010 DS:  ES:  CR0: 80050033
 |  CR2: 2b7e3bf0a000 CR3: 00022c4b7000 CR4: 06e0
 |  DR0:  DR1:  DR2: 
 |  DR3:  DR6: 0ff0 DR7: 0400
 |  
 |  Call Trace:
 |   [88030115] :ext3:ext3_writepage_trans_blocks+0x11/0x9b
 |   [88032178] 

Re: Soft lockup when running large iterations of gtod

2008-02-11 Thread Luis Claudio R. Goncalves
On Mon, Feb 11, 2008 at 08:56:51PM +0530, Ankita Garg wrote:
| Hi,
| 
| On Thu, Feb 07, 2008 at 04:14:09PM +0530, Ankita Garg wrote:
|  Hi,
|  
|  I observed a softlockup when running large number of iterations of the
|  gtod_latency testcase. This testcase measure the time between several
|  pairs of calls to gettimeofday(). 
|  
|  Kernel: 2.6.24-rt1
|  This was on AMD Opteron dual dual-core box.
|  
|  The test output is :
|  
|  --
|  Gettimeofday() Latency
|  --
|  Iterations: 1
| 
| 
| I can reproduce the issue even with smaller number of iterations of the
| gtod_latency test case.

I have observed the same symptoms in my system but realized that:

- System is a Core2Duo (2 CPUs, 1 proc) 2GB RAM 
- The test runs at rtprio 99 and SCHED_FIFO
- It writes a huge amount of data when # of iterations = 1
- It uses 1.5GB+ of RAM for the above # of iterations
- The disk IRQ runs at rtrpio 95 (in my system) and could easily starve
- All my backtraces involve ext3 handling functions

In short, with no memory and lots of data to write, disk IRQ starve. With
the attached patch, that changes schedule to SCHED_OTHER, I am able to run
the test with no errors.

Btw, after running the test, I have these files written to disk:

-rw-rw-r-- 1 lclaudio lclaudio  139 2008-02-11 17:31 hist.dat
-rw-rw-r-- 1 lclaudio lclaudio  174 2008-02-11 17:31 hist.plt
-rw-rw-r-- 1 lclaudio lclaudio 1.1G 2008-02-11 17:31 scatter.dat
-rw-rw-r-- 1 lclaudio lclaudio  191 2008-02-11 17:31 scatter.plt

Regards,
Luis
 
|  Min: 1 us
|  Max: 670971 us
|  Avg: 0.4021 us
|  StdDev: 164.3269 us
|  
|  Soft lockup
|  
|  BUG: soft lockup - CPU#2 stuck for 11s! [gtod_latency:18573]
|  CPU 2:
|  Modules linked in: dock rtc_cmos rtc_core rtc_lib bnx2 shpchp serio_raw
|  lp
|  parport_pc parport ac battery button i2c_core sbs sbshc dm_multipath
|  dm_mirror
|  dm_mod sunrpc rfcomm hidp l2cap bluetooth autofs4 ipv6 sg pcspkr k8temp
|  hwmon
|  mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd
|  ehci_hcd
|  ohci_hcd uhci_hcd
|  Pid: 18573, comm: gtod_latency Not tainted 2.6.24-rt1-autotest #1
|  RIP: 0033:[ff600188]  [ff600188] vread_hpet+0x8/0xf
|  RSP: 002b:7fffce199860  EFLAGS: 0282
|  RAX: eaef6171 RBX: 7fffce199890 RCX: 0002
|  RDX: 041291f4 RSI:  RDI: 7fffce199910
|  RBP:  R08: 1175d794 R09: 0016
|  R10: 1181 R11: 003b0b487160 R12: 
|  R13:  R14: 7fffce199900 R15: 7fffce199910
|  FS:  2b7ddc9266b0() GS:81022fc057c0()
|  knlGS:f7f9e6c0
|  CS:  0010 DS:  ES:  CR0: 8005003b
|  CR2: 003b0b487160 CR3: 00022c4b7000 CR4: 06e0
|  DR0:  DR1:  DR2: 
|  DR3:  DR6: 0ff0 DR7: 0400
|  
|  
|  BUG: soft lockup - CPU#2 stuck for 11s! [gtod_latency:18573]
|  CPU 2:
|  Modules linked in: dock rtc_cmos rtc_core rtc_lib bnx2 shpchp serio_raw
|  lp
|  parport_pc parport ac battery button i2c_core sbs sbshc dm_multipath
|  dm_mirror
|  dm_mod sunrpc rfcomm hidp l2cap bluetooth autofs4 ipv6 sg pcspkr k8temp
|  hwmon
|  mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd
|  ehci_hcd
|  ohci_hcd uhci_hcd
|  Pid: 18573, comm: gtod_latency Not tainted 2.6.24-rt1-autotest #1
|  RIP: 0010:[8803014f]  [8803014f]
|  :ext3:ext3_writepage_trans_blocks+0x4b/0x9b
|  RSP: 0018:810103a8bae8  EFLAGS: 0246
|  RAX: 00010820 RBX: 810103a8baf8 RCX: 0001
|  RDX: 0003 RSI: 810105240358 RDI: 8101052401c0
|  RBP: 80636140 R08:  R09: 810103a8bc60
|  R10: 810103a8bed8 R11: 0001 R12: 8067ffe0
|  R13: 81022df05000 R14: 810182dc62b8 R15: 81022df05000
|  FS:  2b7ddc9266b0() GS:81022fc057c0()
|  knlGS:f7f9e6c0
|  CS:  0010 DS:  ES:  CR0: 80050033
|  CR2: 2b7e3bf0a000 CR3: 00022c4b7000 CR4: 06e0
|  DR0:  DR1:  DR2: 
|  DR3:  DR6: 0ff0 DR7: 0400
|  
|  Call Trace:
|   [88030115] :ext3:ext3_writepage_trans_blocks+0x11/0x9b
|   [88032178] :ext3:ext3_write_begin+0x32/0x19e
|   [8027370e] generic_file_buffered_write+0x151/0x611
|   [8023d245] current_fs_time+0x22/0x29
|   [802ab1c2] file_update_time+0x30/0xad
|   [80273f27] __generic_file_aio_write_nolock+0x359/0x3c3
|   [802922ae] kmem_cache_free+0x63/0x70
|   [80273ff5] generic_file_aio_write+0x64/0xc0
|   [8802e2ae] :ext3:ext3_file_write+0x1e/0x9e
|   [80296b1f] do_sync_write+0xe2/0x126
|   [80288736] page_add_new_anon_rmap+0x20/0x22
|   [8024c860] autoremove_wake_function+0x0/0x38
|   

Soft lockup when running large iterations of gtod

2008-02-07 Thread Ankita Garg
Hi,

I observed a softlockup when running large number of iterations of the
gtod_latency testcase. This testcase measure the time between several
pairs of calls to gettimeofday(). 

Kernel: 2.6.24-rt1
This was on AMD Opteron dual dual-core box.

The test output is :

--
Gettimeofday() Latency
--
Iterations: 1

Min: 1 us
Max: 670971 us
Avg: 0.4021 us
StdDev: 164.3269 us

Soft lockup

BUG: soft lockup - CPU#2 stuck for 11s! [gtod_latency:18573]
CPU 2:
Modules linked in: dock rtc_cmos rtc_core rtc_lib bnx2 shpchp serio_raw
lp
parport_pc parport ac battery button i2c_core sbs sbshc dm_multipath
dm_mirror
dm_mod sunrpc rfcomm hidp l2cap bluetooth autofs4 ipv6 sg pcspkr k8temp
hwmon
mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd
ehci_hcd
ohci_hcd uhci_hcd
Pid: 18573, comm: gtod_latency Not tainted 2.6.24-rt1-autotest #1
RIP: 0033:[ff600188]  [ff600188] vread_hpet+0x8/0xf
RSP: 002b:7fffce199860  EFLAGS: 0282
RAX: eaef6171 RBX: 7fffce199890 RCX: 0002
RDX: 041291f4 RSI:  RDI: 7fffce199910
RBP:  R08: 1175d794 R09: 0016
R10: 1181 R11: 003b0b487160 R12: 
R13:  R14: 7fffce199900 R15: 7fffce199910
FS:  2b7ddc9266b0() GS:81022fc057c0()
knlGS:f7f9e6c0
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 003b0b487160 CR3: 00022c4b7000 CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400


BUG: soft lockup - CPU#2 stuck for 11s! [gtod_latency:18573]
CPU 2:
Modules linked in: dock rtc_cmos rtc_core rtc_lib bnx2 shpchp serio_raw
lp
parport_pc parport ac battery button i2c_core sbs sbshc dm_multipath
dm_mirror
dm_mod sunrpc rfcomm hidp l2cap bluetooth autofs4 ipv6 sg pcspkr k8temp
hwmon
mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd
ehci_hcd
ohci_hcd uhci_hcd
Pid: 18573, comm: gtod_latency Not tainted 2.6.24-rt1-autotest #1
RIP: 0010:[8803014f]  [8803014f]
:ext3:ext3_writepage_trans_blocks+0x4b/0x9b
RSP: 0018:810103a8bae8  EFLAGS: 0246
RAX: 00010820 RBX: 810103a8baf8 RCX: 0001
RDX: 0003 RSI: 810105240358 RDI: 8101052401c0
RBP: 80636140 R08:  R09: 810103a8bc60
R10: 810103a8bed8 R11: 0001 R12: 8067ffe0
R13: 81022df05000 R14: 810182dc62b8 R15: 81022df05000
FS:  2b7ddc9266b0() GS:81022fc057c0()
knlGS:f7f9e6c0
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 2b7e3bf0a000 CR3: 00022c4b7000 CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400

Call Trace:
 [88030115] :ext3:ext3_writepage_trans_blocks+0x11/0x9b
 [88032178] :ext3:ext3_write_begin+0x32/0x19e
 [8027370e] generic_file_buffered_write+0x151/0x611
 [8023d245] current_fs_time+0x22/0x29
 [802ab1c2] file_update_time+0x30/0xad
 [80273f27] __generic_file_aio_write_nolock+0x359/0x3c3
 [802922ae] kmem_cache_free+0x63/0x70
 [80273ff5] generic_file_aio_write+0x64/0xc0
 [8802e2ae] :ext3:ext3_file_write+0x1e/0x9e
 [80296b1f] do_sync_write+0xe2/0x126
 [80288736] page_add_new_anon_rmap+0x20/0x22
 [8024c860] autoremove_wake_function+0x0/0x38
 [80267bf1] audit_syscall_entry+0x148/0x17e
 [8029731b] vfs_write+0xc7/0x150
 [802978c6] sys_write+0x4a/0x76
 [8020c30e] tracesys+0xdc/0xe1

The last working kernel I had tried this on was 2.6.21.4-rt10. I notice that 
the vsyscall support for hpet went in sometime around 2.6.22. Any
thoughts on what could be going wrong here?

-- 
Regards,
Ankita Garg ([EMAIL PROTECTED])
Linux Technology Center
IBM India Systems  Technology Labs, 
Bangalore, India   
-
To unsubscribe from this list: send the line unsubscribe linux-rt-users in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html