Re: Kernel Panic in MPT SAS on 2.6.24 (and 2.6.23.14, 2.6.23.9) (fwd)

2008-02-10 Thread Krzysztof Oledzki

Hello,

Eric Moore is on vacation, adding some CCs and TOs.

-- Forwarded message --
Date: Sun, 10 Feb 2008 18:25:39 +0100 (CET)
From: Krzysztof Oledzki <[EMAIL PROTECTED]>
To: Maximilian Wilhelm <[EMAIL PROTECTED]>
Cc: linux-kernel@vger.kernel.org, [EMAIL PROTECTED]
Subject: Re: Kernel Panic in MPT SAS on 2.6.24 (and 2.6.23.14, 2.6.23.9)



On Sun, 10 Feb 2008, Maximilian Wilhelm wrote:


Am Friday, den  8 February hub Maximilian Wilhelm folgendes in die Tasten:


Just noticed that Eric's address was wrong, so resend with corrected Cc.



Eric, my intial report was http://lkml.org/lkml/2008/2/6/300


Am Thursday, den  7 February hub Krzysztof Oledzki folgendes in die Tasten:

Hi!


While installing my new firewall I got the following kernel panic in
the MPT SAS driver which I need for the disks.



The first kernel I bootet was 2.6.23.14 which did panic so I tried a
2.6.24 which panics, too. Our usual FAI kernel (2.6.23.9) is also
affected.



Could you please try 2.6.22-stable?


Yes it works :-/

I've put some things which on the web which might be helpful:

dmesg   http://files.rfc2324.org/mptsas_panic/2.6.22-dmesg
lspci -vhttp://files.rfc2324.org/mptsas_panic/2.6.22-lspci-v
.config http://files.rfc2324.org/mptsas_panic/2.6.22-config

I'll search for the last working kernel and try to break it down to a
commit tommorow when I can get a serial console or direct access.
The Java driven console redirection is everything else than fulfilling :-(


It looks *very* similar to my problem:



http://bugzilla.kernel.org/show_bug.cgi?id=9909


It seems to be the same controller:

01:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E 
PCI-Express Fusion-MPT SAS (rev 08)

Subsystem: Dell Unknown device 1f10
Flags: bus master, fast devsel, latency 0, IRQ 16
I/O ports at ec00 [size=256]
Memory at fc8fc000 (64-bit, non-prefetchable) [size=16K]
Memory at fc8e (64-bit, non-prefetchable) [size=64K]
Expansion ROM at fc90 [disabled] [size=1M]
Capabilities: [50] Power Management version 2
Capabilities: [68] Express Endpoint IRQ 0
	Capabilities: [98] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 
Enable-

Capabilities: [b0] MSI-X: Enable- Mask- TabSize=1


I did a git bisect between v2.6.22 v2.6.23 and it seems that
 6cb8f91320d3e720351c21741da795fed580b21b
introduced some badness.


Thanks! This was *really* useful!

Now, how about attached patch? Should work with both 2.6.23 and 2.6.24.

Best regards,

Krzysztof Olędzki[SCSI] mpt fusion: Don't oops if NumPhys==0

Don't oops if NumPhys==0, instead return -ENODEV.
This patch fixes http://bugzilla.kernel.org/show_bug.cgi?id=9909

Signed-off-by: Krzysztof Piotr Oledzki <[EMAIL PROTECTED]>

diff -Nur a/drivers/message/fusion/mptsas.c b/drivers/message/fusion/mptsas.c
--- a/drivers/message/fusion/mptsas.c   2007-10-09 22:31:38.0 +0200
+++ b/drivers/message/fusion/mptsas.c   2008-02-10 17:38:51.0 +0100
@@ -1772,6 +1772,11 @@
if (error)
goto out_free_consistent;
 
+   if (!buffer->NumPhys) {
+   error = -ENODEV;
+   goto out_free_consistent;
+   }
+
/* save config data */
port_info->num_phys = buffer->NumPhys;
port_info->phy_info = kcalloc(port_info->num_phys,


Re: Kernel Panic in MPT SAS on 2.6.24 (and 2.6.23.14, 2.6.23.9)

2008-02-10 Thread Krzysztof Oledzki



On Sun, 10 Feb 2008, Maximilian Wilhelm wrote:


Am Friday, den  8 February hub Maximilian Wilhelm folgendes in die Tasten:


Just noticed that Eric's address was wrong, so resend with corrected Cc.



Eric, my intial report was http://lkml.org/lkml/2008/2/6/300


Am Thursday, den  7 February hub Krzysztof Oledzki folgendes in die Tasten:

Hi!


While installing my new firewall I got the following kernel panic in
the MPT SAS driver which I need for the disks.



The first kernel I bootet was 2.6.23.14 which did panic so I tried a
2.6.24 which panics, too. Our usual FAI kernel (2.6.23.9) is also
affected.



Could you please try 2.6.22-stable?


Yes it works :-/

I've put some things which on the web which might be helpful:

dmesg   http://files.rfc2324.org/mptsas_panic/2.6.22-dmesg
lspci -vhttp://files.rfc2324.org/mptsas_panic/2.6.22-lspci-v
.config http://files.rfc2324.org/mptsas_panic/2.6.22-config

I'll search for the last working kernel and try to break it down to a
commit tommorow when I can get a serial console or direct access.
The Java driven console redirection is everything else than fulfilling :-(


It looks *very* similar to my problem:



http://bugzilla.kernel.org/show_bug.cgi?id=9909


It seems to be the same controller:

01:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E PCI-Express 
Fusion-MPT SAS (rev 08)
Subsystem: Dell Unknown device 1f10
Flags: bus master, fast devsel, latency 0, IRQ 16
I/O ports at ec00 [size=256]
Memory at fc8fc000 (64-bit, non-prefetchable) [size=16K]
Memory at fc8e (64-bit, non-prefetchable) [size=64K]
Expansion ROM at fc90 [disabled] [size=1M]
Capabilities: [50] Power Management version 2
Capabilities: [68] Express Endpoint IRQ 0
Capabilities: [98] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 
Enable-
Capabilities: [b0] MSI-X: Enable- Mask- TabSize=1


I did a git bisect between v2.6.22 v2.6.23 and it seems that
 6cb8f91320d3e720351c21741da795fed580b21b
introduced some badness.


Thanks! This was *really* useful!

Now, how about attached patch? Should work with both 2.6.23 and 2.6.24.

Best regards,

Krzysztof Olędzki[SCSI] mpt fusion: Don't oops if NumPhys==0

Don't oops if NumPhys==0, instead return -ENODEV.
This patch fixes http://bugzilla.kernel.org/show_bug.cgi?id=9909

Signed-off-by: Krzysztof Piotr Oledzki <[EMAIL PROTECTED]>

diff -Nur a/drivers/message/fusion/mptsas.c b/drivers/message/fusion/mptsas.c
--- a/drivers/message/fusion/mptsas.c   2007-10-09 22:31:38.0 +0200
+++ b/drivers/message/fusion/mptsas.c   2008-02-10 17:38:51.0 +0100
@@ -1772,6 +1772,11 @@
if (error)
goto out_free_consistent;
 
+   if (!buffer->NumPhys) {
+   error = -ENODEV;
+   goto out_free_consistent;
+   }
+
/* save config data */
port_info->num_phys = buffer->NumPhys;
port_info->phy_info = kcalloc(port_info->num_phys,


Re: Kernel Panic in MPT SAS on 2.6.24 (and 2.6.23.14, 2.6.23.9)

2008-02-10 Thread Krzysztof Oledzki



On Sun, 10 Feb 2008, Maximilian Wilhelm wrote:


Am Friday, den  8 February hub Maximilian Wilhelm folgendes in die Tasten:


Just noticed that Eric's address was wrong, so resend with corrected Cc.



Eric, my intial report was http://lkml.org/lkml/2008/2/6/300


Am Thursday, den  7 February hub Krzysztof Oledzki folgendes in die Tasten:

Hi!


While installing my new firewall I got the following kernel panic in
the MPT SAS driver which I need for the disks.



The first kernel I bootet was 2.6.23.14 which did panic so I tried a
2.6.24 which panics, too. Our usual FAI kernel (2.6.23.9) is also
affected.



Could you please try 2.6.22-stable?


Yes it works :-/

I've put some things which on the web which might be helpful:

dmesg   http://files.rfc2324.org/mptsas_panic/2.6.22-dmesg
lspci -vhttp://files.rfc2324.org/mptsas_panic/2.6.22-lspci-v
.config http://files.rfc2324.org/mptsas_panic/2.6.22-config

I'll search for the last working kernel and try to break it down to a
commit tommorow when I can get a serial console or direct access.
The Java driven console redirection is everything else than fulfilling :-(


It looks *very* similar to my problem:



http://bugzilla.kernel.org/show_bug.cgi?id=9909


It seems to be the same controller:

01:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E PCI-Express 
Fusion-MPT SAS (rev 08)
Subsystem: Dell Unknown device 1f10
Flags: bus master, fast devsel, latency 0, IRQ 16
I/O ports at ec00 [size=256]
Memory at fc8fc000 (64-bit, non-prefetchable) [size=16K]
Memory at fc8e (64-bit, non-prefetchable) [size=64K]
Expansion ROM at fc90 [disabled] [size=1M]
Capabilities: [50] Power Management version 2
Capabilities: [68] Express Endpoint IRQ 0
Capabilities: [98] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 
Enable-
Capabilities: [b0] MSI-X: Enable- Mask- TabSize=1


I did a git bisect between v2.6.22 v2.6.23 and it seems that
 6cb8f91320d3e720351c21741da795fed580b21b
introduced some badness.


Thanks! This was *really* useful!

Now, how about attached patch? Should work with both 2.6.23 and 2.6.24.

Best regards,

Krzysztof Olędzki[SCSI] mpt fusion: Don't oops if NumPhys==0

Don't oops if NumPhys==0, instead return -ENODEV.
This patch fixes http://bugzilla.kernel.org/show_bug.cgi?id=9909

Signed-off-by: Krzysztof Piotr Oledzki [EMAIL PROTECTED]

diff -Nur a/drivers/message/fusion/mptsas.c b/drivers/message/fusion/mptsas.c
--- a/drivers/message/fusion/mptsas.c   2007-10-09 22:31:38.0 +0200
+++ b/drivers/message/fusion/mptsas.c   2008-02-10 17:38:51.0 +0100
@@ -1772,6 +1772,11 @@
if (error)
goto out_free_consistent;
 
+   if (!buffer-NumPhys) {
+   error = -ENODEV;
+   goto out_free_consistent;
+   }
+
/* save config data */
port_info-num_phys = buffer-NumPhys;
port_info-phy_info = kcalloc(port_info-num_phys,


Re: Kernel Panic in MPT SAS on 2.6.24 (and 2.6.23.14, 2.6.23.9) (fwd)

2008-02-10 Thread Krzysztof Oledzki

Hello,

Eric Moore is on vacation, adding some CCs and TOs.

-- Forwarded message --
Date: Sun, 10 Feb 2008 18:25:39 +0100 (CET)
From: Krzysztof Oledzki [EMAIL PROTECTED]
To: Maximilian Wilhelm [EMAIL PROTECTED]
Cc: linux-kernel@vger.kernel.org, [EMAIL PROTECTED]
Subject: Re: Kernel Panic in MPT SAS on 2.6.24 (and 2.6.23.14, 2.6.23.9)



On Sun, 10 Feb 2008, Maximilian Wilhelm wrote:


Am Friday, den  8 February hub Maximilian Wilhelm folgendes in die Tasten:


Just noticed that Eric's address was wrong, so resend with corrected Cc.



Eric, my intial report was http://lkml.org/lkml/2008/2/6/300


Am Thursday, den  7 February hub Krzysztof Oledzki folgendes in die Tasten:

Hi!


While installing my new firewall I got the following kernel panic in
the MPT SAS driver which I need for the disks.



The first kernel I bootet was 2.6.23.14 which did panic so I tried a
2.6.24 which panics, too. Our usual FAI kernel (2.6.23.9) is also
affected.



Could you please try 2.6.22-stable?


Yes it works :-/

I've put some things which on the web which might be helpful:

dmesg   http://files.rfc2324.org/mptsas_panic/2.6.22-dmesg
lspci -vhttp://files.rfc2324.org/mptsas_panic/2.6.22-lspci-v
.config http://files.rfc2324.org/mptsas_panic/2.6.22-config

I'll search for the last working kernel and try to break it down to a
commit tommorow when I can get a serial console or direct access.
The Java driven console redirection is everything else than fulfilling :-(


It looks *very* similar to my problem:



http://bugzilla.kernel.org/show_bug.cgi?id=9909


It seems to be the same controller:

01:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E 
PCI-Express Fusion-MPT SAS (rev 08)

Subsystem: Dell Unknown device 1f10
Flags: bus master, fast devsel, latency 0, IRQ 16
I/O ports at ec00 [size=256]
Memory at fc8fc000 (64-bit, non-prefetchable) [size=16K]
Memory at fc8e (64-bit, non-prefetchable) [size=64K]
Expansion ROM at fc90 [disabled] [size=1M]
Capabilities: [50] Power Management version 2
Capabilities: [68] Express Endpoint IRQ 0
	Capabilities: [98] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 
Enable-

Capabilities: [b0] MSI-X: Enable- Mask- TabSize=1


I did a git bisect between v2.6.22 v2.6.23 and it seems that
 6cb8f91320d3e720351c21741da795fed580b21b
introduced some badness.


Thanks! This was *really* useful!

Now, how about attached patch? Should work with both 2.6.23 and 2.6.24.

Best regards,

Krzysztof Olędzki[SCSI] mpt fusion: Don't oops if NumPhys==0

Don't oops if NumPhys==0, instead return -ENODEV.
This patch fixes http://bugzilla.kernel.org/show_bug.cgi?id=9909

Signed-off-by: Krzysztof Piotr Oledzki [EMAIL PROTECTED]

diff -Nur a/drivers/message/fusion/mptsas.c b/drivers/message/fusion/mptsas.c
--- a/drivers/message/fusion/mptsas.c   2007-10-09 22:31:38.0 +0200
+++ b/drivers/message/fusion/mptsas.c   2008-02-10 17:38:51.0 +0100
@@ -1772,6 +1772,11 @@
if (error)
goto out_free_consistent;
 
+   if (!buffer-NumPhys) {
+   error = -ENODEV;
+   goto out_free_consistent;
+   }
+
/* save config data */
port_info-num_phys = buffer-NumPhys;
port_info-phy_info = kcalloc(port_info-num_phys,


Re: [PATCH] ipvs: Make the synchronization interval controllable

2008-02-08 Thread Krzysztof Oledzki



On Fri, 8 Feb 2008, Andi Kleen wrote:


Sven Wegener <[EMAIL PROTECTED]> writes:


The default synchronization interval of 1000 milliseconds is too high for a
heavily loaded director. Collecting the connection information from one second
and then sending it out in a burst will overflow the socket buffer and lead to
synchronization information being dropped. Make the interval controllable by a
sysctl variable so that users can tune it.


It would be better if the defaults just worked under all circumstances.
So why not just lower the default?

Or the code could detect overflowing socket buffers and lower the
value dynamically.


We can also start sending when amount of data reaches defined level.

Best regards,

Krzysztof Olędzki

Re: [PATCH] ipvs: Make the synchronization interval controllable

2008-02-08 Thread Krzysztof Oledzki



On Fri, 8 Feb 2008, Andi Kleen wrote:


Sven Wegener [EMAIL PROTECTED] writes:


The default synchronization interval of 1000 milliseconds is too high for a
heavily loaded director. Collecting the connection information from one second
and then sending it out in a burst will overflow the socket buffer and lead to
synchronization information being dropped. Make the interval controllable by a
sysctl variable so that users can tune it.


It would be better if the defaults just worked under all circumstances.
So why not just lower the default?

Or the code could detect overflowing socket buffers and lower the
value dynamically.


We can also start sending when amount of data reaches defined level.

Best regards,

Krzysztof Olędzki

Re: Kernel Panic in MPT SAS on 2.6.24 (and 2.6.23.14, 2.6.23.9)

2008-02-07 Thread Krzysztof Oledzki

On 2008-02-06 22:04, Maximilian Wilhelm wrote:

Hi!

While installing my new firewall I got the following kernel panic in
the MPT SAS driver which I need for the disks.

The first kernel I bootet was 2.6.23.14 which did panic so I tried a
2.6.24 which panics, too. Our usual FAI kernel (2.6.23.9) is also
affected.





Fusion MPT base driver 3.04.06
Copyright (c) 1999-2007 LSI Corporation
Fusion MPT SAS Host driver 3.04.06
mptbase: ioc0: Initiating bringup
ioc0: LSISAS1068E B3: Capabilities={Initiator}
scsi0 : ioc0: LSISAS1068E B3, FwRev=00142e00h, Ports=1, MaxQ=511, IRQ=16
scsi 0:0:0:0: Direct-Access SEAGATE  ST973402SS   S207 PQ: 0 ANSI: 5
scsi 0:0:1:0: Direct-Access SEAGATE  ST973402SS   S207 PQ: 0 ANSI: 5
BUG: unable to handle kernel NULL pointer dereference at virtual address 
0010
printing eip: c02c0b38 *pde =  
Oops:  [#1] SMP 
Modules linked in:


Pid: 1, comm: swapper Not tainted (2.6.24 #1)
EIP: 0060:[] EFLAGS: 00010246 CPU: 1
EIP is at mptsas_probe_expander_phys+0x51/0x4a2
EAX: 0010 EBX: f7457ec0 ECX: f7c3fd9c EDX: 0004
ESI: f7fe7800 EDI: f7fe7800 EBP: f7fe7904 ESP: f7c3fe18
 DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
Process swapper (pid: 1, ti=f7c3e000 task=f7c22ab0 task.ti=f7c3e000)
Stack:   00200200 fffefd74 c02b9cc8 f7fe7800 c04c5280 f7c3fecc 
   376b1000 0001    00100100 00200200  
   00200200 fffefd74 c02b9cc8 f7fe7800 c04c5280 f7c3fe8c 376b1000 0001 
Call Trace:

 [] mpt_timer_expired+0x0/0x5c
 [] mpt_timer_expired+0x0/0x5c
 [] ide_wait_cmd+0x90/0xa0
 [] mptsas_probe+0x38a/0x40b
 [] sysfs_create_link+0xb7/0xf9
 [] pci_device_probe+0x36/0x57
 [] driver_probe_device+0xde/0x15c
 [] klist_next+0x4b/0x6b
 [] __driver_attach+0x0/0x79
 [] __driver_attach+0x46/0x79
 [] bus_for_each_dev+0x33/0x55
 [] driver_attach+0x16/0x18
 [] __driver_attach+0x0/0x79
 [] bus_add_driver+0x6d/0x197
 [] __pci_register_driver+0x48/0x74
 [] mptsas_init+0xbf/0xd6
 [] kernel_init+0x140/0x2a2
 [] ret_from_fork+0x6/0x1c
 [] kernel_init+0x0/0x2a2
 [] kernel_init+0x0/0x2a2
 [] kernel_thread_helper+0x7/0x10
 ===
Code: 85 c0 0f 84 68 04 00 00 8b 54 24 1c 8b 02 89 04 24 31 c9 89 da 89 f8 e8 2b f2 ff ff 89 44 24 2c 85 c0 8b 43 0c 0f 85 39 04 00 00 <0f> b7 00 8b 74 24 1c 89 06 8d 87 24 05 00 00 89 44 24 20 e8 5b 
EIP: [] mptsas_probe_expander_phys+0x51/0x4a2 SS:ESP 0068:f7c3fe18

---[ end trace 50b3e7147499e641 ]---
Kernel panic - not syncing: Attempted to kill init!


Could you please try 2.6.22-stable? It looks *very* similar to my problem:

http://bugzilla.kernel.org/show_bug.cgi?id=9909

Best regards,

Krzysztof Olędzki
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel Panic in MPT SAS on 2.6.24 (and 2.6.23.14, 2.6.23.9)

2008-02-07 Thread Krzysztof Oledzki

On 2008-02-06 22:04, Maximilian Wilhelm wrote:

Hi!

While installing my new firewall I got the following kernel panic in
the MPT SAS driver which I need for the disks.

The first kernel I bootet was 2.6.23.14 which did panic so I tried a
2.6.24 which panics, too. Our usual FAI kernel (2.6.23.9) is also
affected.


CUT


Fusion MPT base driver 3.04.06
Copyright (c) 1999-2007 LSI Corporation
Fusion MPT SAS Host driver 3.04.06
mptbase: ioc0: Initiating bringup
ioc0: LSISAS1068E B3: Capabilities={Initiator}
scsi0 : ioc0: LSISAS1068E B3, FwRev=00142e00h, Ports=1, MaxQ=511, IRQ=16
scsi 0:0:0:0: Direct-Access SEAGATE  ST973402SS   S207 PQ: 0 ANSI: 5
scsi 0:0:1:0: Direct-Access SEAGATE  ST973402SS   S207 PQ: 0 ANSI: 5
BUG: unable to handle kernel NULL pointer dereference at virtual address 
0010
printing eip: c02c0b38 *pde =  
Oops:  [#1] SMP 
Modules linked in:


Pid: 1, comm: swapper Not tainted (2.6.24 #1)
EIP: 0060:[c02c0b38] EFLAGS: 00010246 CPU: 1
EIP is at mptsas_probe_expander_phys+0x51/0x4a2
EAX: 0010 EBX: f7457ec0 ECX: f7c3fd9c EDX: 0004
ESI: f7fe7800 EDI: f7fe7800 EBP: f7fe7904 ESP: f7c3fe18
 DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
Process swapper (pid: 1, ti=f7c3e000 task=f7c22ab0 task.ti=f7c3e000)
Stack:   00200200 fffefd74 c02b9cc8 f7fe7800 c04c5280 f7c3fecc 
   376b1000 0001    00100100 00200200  
   00200200 fffefd74 c02b9cc8 f7fe7800 c04c5280 f7c3fe8c 376b1000 0001 
Call Trace:

 [c02b9cc8] mpt_timer_expired+0x0/0x5c
 [c02b9cc8] mpt_timer_expired+0x0/0x5c
 [c028] ide_wait_cmd+0x90/0xa0
 [c02c2806] mptsas_probe+0x38a/0x40b
 [c0180522] sysfs_create_link+0xb7/0xf9
 [c021ceb6] pci_device_probe+0x36/0x57
 [c023bcd0] driver_probe_device+0xde/0x15c
 [c036d3e5] klist_next+0x4b/0x6b
 [c023bde0] __driver_attach+0x0/0x79
 [c023be26] __driver_attach+0x46/0x79
 [c023b2a8] bus_for_each_dev+0x33/0x55
 [c023bb37] driver_attach+0x16/0x18
 [c023bde0] __driver_attach+0x0/0x79
 [c023b58e] bus_add_driver+0x6d/0x197
 [c021cff2] __pci_register_driver+0x48/0x74
 [c0480bd3] mptsas_init+0xbf/0xd6
 [c046c74e] kernel_init+0x140/0x2a2
 [c01024ca] ret_from_fork+0x6/0x1c
 [c046c60e] kernel_init+0x0/0x2a2
 [c046c60e] kernel_init+0x0/0x2a2
 [c010319f] kernel_thread_helper+0x7/0x10
 ===
Code: 85 c0 0f 84 68 04 00 00 8b 54 24 1c 8b 02 89 04 24 31 c9 89 da 89 f8 e8 2b f2 ff ff 89 44 24 2c 85 c0 8b 43 0c 0f 85 39 04 00 00 0f b7 00 8b 74 24 1c 89 06 8d 87 24 05 00 00 89 44 24 20 e8 5b 
EIP: [c02c0b38] mptsas_probe_expander_phys+0x51/0x4a2 SS:ESP 0068:f7c3fe18

---[ end trace 50b3e7147499e641 ]---
Kernel panic - not syncing: Attempted to kill init!


Could you please try 2.6.22-stable? It looks *very* similar to my problem:

http://bugzilla.kernel.org/show_bug.cgi?id=9909

Best regards,

Krzysztof Olędzki
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cups slow on linux-2.6.24

2008-01-28 Thread Krzysztof Oledzki



On Tue, 29 Jan 2008, Jeff Chua wrote:




On Jan 28, 2008 7:18 AM, Jeff Chua <[EMAIL PROTECTED]> wrote:

I'm sending printing jobs to a network printer (it's actually printing
to the localhost simply creating a file), and running this on
Linux-2.6.24 will cause the printing to slow down to 1 print every 3
seconds after printing 500 times.


I bisected the kernel since the last known good 2.6.23 and zeroed in to this 
commit.


commit 17311393f969090ab060540bd9dbe7dc885a76d5
Author: Jozsef Kadlecsik <[EMAIL PROTECTED]>
Date:   Thu Oct 11 14:35:52 2007 -0700

   [NETFILTER]: nf_conntrack_tcp: fix connection reopening


Reverting this commit solves the problem.

Version  1000 jobs
2.6.23 90 sec
2.6.24  1,492 sec  <== with commit
2.6.24(patch)  90 sec  <== reverted the commit



Strange. You stated that 2.6.23.12 is OK, however above patch
was included in 2.6.23.4:

http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.23.4
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.23.y.git;a=commitdiff;h=0ac38060c5e1e12e851ed3e281597286b57f9ad1

Commit 0ac38060c5e1e12e851ed3e281597286b57f9ad1 combines two fixes from 
2.6.24:


http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=17311393f969090ab060540bd9dbe7dc885a76d5
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=bc34b841556aad437baf4199744e55500bfa2088

Are you 100% sure that 2.6.23.12 is OK?

Best regards,

Krzysztof Olędzki

Re: cups slow on linux-2.6.24

2008-01-28 Thread Krzysztof Oledzki



On Tue, 29 Jan 2008, Jeff Chua wrote:




On Jan 28, 2008 7:18 AM, Jeff Chua [EMAIL PROTECTED] wrote:

I'm sending printing jobs to a network printer (it's actually printing
to the localhost simply creating a file), and running this on
Linux-2.6.24 will cause the printing to slow down to 1 print every 3
seconds after printing 500 times.


I bisected the kernel since the last known good 2.6.23 and zeroed in to this 
commit.


commit 17311393f969090ab060540bd9dbe7dc885a76d5
Author: Jozsef Kadlecsik [EMAIL PROTECTED]
Date:   Thu Oct 11 14:35:52 2007 -0700

   [NETFILTER]: nf_conntrack_tcp: fix connection reopening


Reverting this commit solves the problem.

Version  1000 jobs
2.6.23 90 sec
2.6.24  1,492 sec  == with commit
2.6.24(patch)  90 sec  == reverted the commit



Strange. You stated that 2.6.23.12 is OK, however above patch
was included in 2.6.23.4:

http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.23.4
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.23.y.git;a=commitdiff;h=0ac38060c5e1e12e851ed3e281597286b57f9ad1

Commit 0ac38060c5e1e12e851ed3e281597286b57f9ad1 combines two fixes from 
2.6.24:


http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=17311393f969090ab060540bd9dbe7dc885a76d5
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=bc34b841556aad437baf4199744e55500bfa2088

Are you 100% sure that 2.6.23.12 is OK?

Best regards,

Krzysztof Olędzki

Re: [Bug 9182] Critical memory leak (dirty pages)

2007-12-21 Thread Krzysztof Oledzki



On Thu, 20 Dec 2007, Björn Steinbrink wrote:


On 2007.12.20 08:25:56 -0800, Linus Torvalds wrote:



On Thu, 20 Dec 2007, Bj?rn Steinbrink wrote:


OK, so I looked for PG_dirty anyway.

In 46d2277c796f9f4937bfa668c40b2e3f43e93dd0 you made try_to_free_buffers
bail out if the page is dirty.

Then in 3e67c0987d7567ad41164a153dca9a43b11d, Andrew fixed
truncate_complete_page, because it called cancel_dirty_page (and thus
cleared PG_dirty) after try_to_free_buffers was called via
do_invalidatepage.

Now, if I'm not mistaken, we can end up as follows.

truncate_complete_page()
  cancel_dirty_page() // PG_dirty cleared, decr. dirty pages
  do_invalidatepage()
ext3_invalidatepage()
  journal_invalidatepage()
journal_unmap_buffer()
  __dispose_buffer()
__journal_unfile_buffer()
  __journal_temp_unlink_buffer()
mark_buffer_dirty(); // PG_dirty set, incr. dirty pages


Good, this seems to be the exact path that actually triggers it. I got to
journal_unmap_buffer(), but was too lazy to actually then bother to follow
it all the way down - I decided that I didn't actually really even care
what the low-level FS layer did, I had already convinced myself that it
obviously must be dirtying the page some way, since that matched the
symptoms exactly (ie only the journaling case was impacted, and this was
all about the journal).

But perhaps more importantly: regardless of what the low-level filesystem
did at that point, the VM accounting shouldn't care, and should be robust
in the face of a low-level filesystem doing strange and wonderful things.
But thanks for bothering to go through the whole history and figure out
what exactly is up.


Oh well, after seeing the move of cancel_dirty_page, I just went
backwards from __set_page_dirty using cscope + some smart guessing and
quickly ended up at ext3_invalidatepage, so it wasn't that hard :-)


As try_to_free_buffers got its ext3 hack back in
ecdfc9787fe527491baefc22dce8b2dbd5b2908d, maybe
3e67c0987d7567ad41164a153dca9a43b11d should be reverted? (Except for
the accounting fix in cancel_dirty_page, of course).


Yes, I think we have room for cleanups now, and I agree: we ended up
reinstating some questionable code in the VM just because we didn't really
know or understand what was going on in the ext3 journal code.


Hm, you attributed more to my mail than there was actually in it. I
didn't even start to think of cleanups (because I don't know jack about
the whole ext3/jdb stuff, so I simply cannot come up with any cleanups
(yet?)).What I meant is that we only did a half-revert of that hackery.

When try_to_free_buffers started to check for PG_dirty, the
cancel_dirty_page call had to be called before do_invalidatepage, to
"fix" a _huge_ leak.  But that caused the accouting breakage we're now
seeing, because we never account for the pages that got redirtied during
do_invalidatepage.

Then the change to try_to_free_buffers got reverted, so we no longer
need to call cancel_dirty_page before do_invalidatepage, but still we
do. Thus the accounting bug remains. So what I meant to suggest was
simply to actually "finish" the revert we started.

Or expressed as a patch:

diff --git a/mm/truncate.c b/mm/truncate.c
index cadc156..2974903 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -98,11 +98,11 @@ truncate_complete_page(struct address_space *mapping, 
struct page *page)
if (page->mapping != mapping)
return;

-   cancel_dirty_page(page, PAGE_CACHE_SIZE);
-
if (PagePrivate(page))
do_invalidatepage(page, 0);

+   cancel_dirty_page(page, PAGE_CACHE_SIZE);
+
remove_from_page_cache(page);
ClearPageUptodate(page);
ClearPageMappedToDisk(page);

I'll be the last one to comment on whether or not that causes inaccurate
accouting, so I'll just watch you and Jan battle that out until someone
comes up with a post-.24 patch to provide a clean fix for the issue.

Krzysztof, could you give this patch a test run?

If that "fixes" the problem for now, I'll try to come up with some
usable commit message, or if somehow wants to beat me to it, you can
already have my

Signed-off-by: Björn Steinbrink <[EMAIL PROTECTED]>


Checked with 2.6.24-rc5 + debug/fixup patch from Linus + above fix. After 
3h there have been no warnings about __remove_from_page_cache(). So, it 
seems that it is OK.


Tested-by: Krzysztof Piotr Oledzki <[EMAIL PROTECTED]>

Best regards,

Krzysztof Olędzki

Re: [Bug 9182] Critical memory leak (dirty pages)

2007-12-21 Thread Krzysztof Oledzki



On Thu, 20 Dec 2007, Björn Steinbrink wrote:


On 2007.12.20 08:25:56 -0800, Linus Torvalds wrote:



On Thu, 20 Dec 2007, Bj?rn Steinbrink wrote:


OK, so I looked for PG_dirty anyway.

In 46d2277c796f9f4937bfa668c40b2e3f43e93dd0 you made try_to_free_buffers
bail out if the page is dirty.

Then in 3e67c0987d7567ad41164a153dca9a43b11d, Andrew fixed
truncate_complete_page, because it called cancel_dirty_page (and thus
cleared PG_dirty) after try_to_free_buffers was called via
do_invalidatepage.

Now, if I'm not mistaken, we can end up as follows.

truncate_complete_page()
  cancel_dirty_page() // PG_dirty cleared, decr. dirty pages
  do_invalidatepage()
ext3_invalidatepage()
  journal_invalidatepage()
journal_unmap_buffer()
  __dispose_buffer()
__journal_unfile_buffer()
  __journal_temp_unlink_buffer()
mark_buffer_dirty(); // PG_dirty set, incr. dirty pages


Good, this seems to be the exact path that actually triggers it. I got to
journal_unmap_buffer(), but was too lazy to actually then bother to follow
it all the way down - I decided that I didn't actually really even care
what the low-level FS layer did, I had already convinced myself that it
obviously must be dirtying the page some way, since that matched the
symptoms exactly (ie only the journaling case was impacted, and this was
all about the journal).

But perhaps more importantly: regardless of what the low-level filesystem
did at that point, the VM accounting shouldn't care, and should be robust
in the face of a low-level filesystem doing strange and wonderful things.
But thanks for bothering to go through the whole history and figure out
what exactly is up.


Oh well, after seeing the move of cancel_dirty_page, I just went
backwards from __set_page_dirty using cscope + some smart guessing and
quickly ended up at ext3_invalidatepage, so it wasn't that hard :-)


As try_to_free_buffers got its ext3 hack back in
ecdfc9787fe527491baefc22dce8b2dbd5b2908d, maybe
3e67c0987d7567ad41164a153dca9a43b11d should be reverted? (Except for
the accounting fix in cancel_dirty_page, of course).


Yes, I think we have room for cleanups now, and I agree: we ended up
reinstating some questionable code in the VM just because we didn't really
know or understand what was going on in the ext3 journal code.


Hm, you attributed more to my mail than there was actually in it. I
didn't even start to think of cleanups (because I don't know jack about
the whole ext3/jdb stuff, so I simply cannot come up with any cleanups
(yet?)).What I meant is that we only did a half-revert of that hackery.

When try_to_free_buffers started to check for PG_dirty, the
cancel_dirty_page call had to be called before do_invalidatepage, to
fix a _huge_ leak.  But that caused the accouting breakage we're now
seeing, because we never account for the pages that got redirtied during
do_invalidatepage.

Then the change to try_to_free_buffers got reverted, so we no longer
need to call cancel_dirty_page before do_invalidatepage, but still we
do. Thus the accounting bug remains. So what I meant to suggest was
simply to actually finish the revert we started.

Or expressed as a patch:

diff --git a/mm/truncate.c b/mm/truncate.c
index cadc156..2974903 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -98,11 +98,11 @@ truncate_complete_page(struct address_space *mapping, 
struct page *page)
if (page-mapping != mapping)
return;

-   cancel_dirty_page(page, PAGE_CACHE_SIZE);
-
if (PagePrivate(page))
do_invalidatepage(page, 0);

+   cancel_dirty_page(page, PAGE_CACHE_SIZE);
+
remove_from_page_cache(page);
ClearPageUptodate(page);
ClearPageMappedToDisk(page);

I'll be the last one to comment on whether or not that causes inaccurate
accouting, so I'll just watch you and Jan battle that out until someone
comes up with a post-.24 patch to provide a clean fix for the issue.

Krzysztof, could you give this patch a test run?

If that fixes the problem for now, I'll try to come up with some
usable commit message, or if somehow wants to beat me to it, you can
already have my

Signed-off-by: Björn Steinbrink [EMAIL PROTECTED]


Checked with 2.6.24-rc5 + debug/fixup patch from Linus + above fix. After 
3h there have been no warnings about __remove_from_page_cache(). So, it 
seems that it is OK.


Tested-by: Krzysztof Piotr Oledzki [EMAIL PROTECTED]

Best regards,

Krzysztof Olędzki

Re: [PATCH] sky2: Use deferrable timer for watchdog

2007-12-20 Thread Krzysztof Oledzki



On Thu, 20 Dec 2007, Parag Warudkar wrote:


On Dec 20, 2007 2:22 PM, Kok, Auke <[EMAIL PROTECTED]> wrote:

ok, that's just bad and if there's no user-defineable limit to the deferral I
definately don't like this change.

Can I safely assume that any irq will cause all deferred timers to run?


I think even other causes for wakeup like process related ones will
cause the CPU to go busy and run the timers.
This, coupled with the fact that no one is yet able to reach 0 wakeups
per second makes it pretty unlikely that deferrable timers will be
deferred indefinitely.



If this is the case then for e1000 this patch is still OK since the watchdog 
needs
to run (1) after a link up/down interrupt or (2) to update statistics. Those
statistics won't increase if there is no traffic of course...



I think it is reasonable for Network driver watchdogs to use a
deferrable timer - if the machine is 100% IDLE there is no one needing
the network to be up.


Please note tha being connected to a network does not only mean to send 
but also to receive.


Best regards,

    Krzysztof Oledzki
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sky2: Use deferrable timer for watchdog

2007-12-20 Thread Krzysztof Oledzki



On Thu, 20 Dec 2007, Parag Warudkar wrote:


On Dec 20, 2007 2:22 PM, Kok, Auke [EMAIL PROTECTED] wrote:

ok, that's just bad and if there's no user-defineable limit to the deferral I
definately don't like this change.

Can I safely assume that any irq will cause all deferred timers to run?


I think even other causes for wakeup like process related ones will
cause the CPU to go busy and run the timers.
This, coupled with the fact that no one is yet able to reach 0 wakeups
per second makes it pretty unlikely that deferrable timers will be
deferred indefinitely.



If this is the case then for e1000 this patch is still OK since the watchdog 
needs
to run (1) after a link up/down interrupt or (2) to update statistics. Those
statistics won't increase if there is no traffic of course...



I think it is reasonable for Network driver watchdogs to use a
deferrable timer - if the machine is 100% IDLE there is no one needing
the network to be up.


Please note tha being connected to a network does not only mean to send 
but also to receive.


Best regards,

Krzysztof Oledzki
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 9182] Critical memory leak (dirty pages)

2007-12-17 Thread Krzysztof Oledzki



On Sun, 16 Dec 2007, Andrew Morton wrote:


On Sun, 16 Dec 2007 14:46:36 +0100 (CET) Krzysztof Oledzki <[EMAIL PROTECTED]> 
wrote:


Which filesystem, which mount options


  - ext3 on RAID1 (MD): / - rootflags=data=journal


It wouldn't surprise me if this is specific to data=journal: that
journalling mode is pretty complex wrt dairty-data handling and isn't well
tested.

Does switching that to data=writeback change things?


I'll confirm this tomorrow but it seems that even switching to
data=ordered (AFAIK default o ext3) is indeed enough to cure this problem.


yes, sorry, I meant ordered.


OK, I can confirm that the problem is with data=journal. With data=ordered 
I get:


# uname -rns;uptime;sync;sleep 1;sync ;sleep 1; sync;grep Dirty /proc/meminfo
Linux cougar 2.6.24-rc5
 17:50:34 up 1 day, 20 min,  1 user,  load average: 0.99, 0.48, 0.35
Dirty:   0 kB


Two questions remain then: why system dies when dirty reaches ~200MB


I think you have ~2G of RAM and you're running with
/proc/sys/vm/dirty_ratio=10, yes?

If so, when that machine hits 10% * 2G of dirty memory then everyone who
wants to dirty pages gets blocked.


Oh, right. Thank you for the explanation.


and what is wrong with ext3+data=journal with >=2.6.20-rc2?


Ah.  It has a bug in it ;)

As I said, data=journal has exceptional handling of pagecache data and is
not well tested.  Someone (and I'm not sure who) will need to get in there
and fix it.


OK, I'm willing to test it. ;)

Best regrds,

Krzysztof Olędzki

Re: [Bug 9182] Critical memory leak (dirty pages)

2007-12-17 Thread Krzysztof Oledzki



On Sun, 16 Dec 2007, Andrew Morton wrote:


On Sun, 16 Dec 2007 14:46:36 +0100 (CET) Krzysztof Oledzki [EMAIL PROTECTED] 
wrote:


Which filesystem, which mount options


  - ext3 on RAID1 (MD): / - rootflags=data=journal


It wouldn't surprise me if this is specific to data=journal: that
journalling mode is pretty complex wrt dairty-data handling and isn't well
tested.

Does switching that to data=writeback change things?


I'll confirm this tomorrow but it seems that even switching to
data=ordered (AFAIK default o ext3) is indeed enough to cure this problem.


yes, sorry, I meant ordered.


OK, I can confirm that the problem is with data=journal. With data=ordered 
I get:


# uname -rns;uptime;sync;sleep 1;sync ;sleep 1; sync;grep Dirty /proc/meminfo
Linux cougar 2.6.24-rc5
 17:50:34 up 1 day, 20 min,  1 user,  load average: 0.99, 0.48, 0.35
Dirty:   0 kB


Two questions remain then: why system dies when dirty reaches ~200MB


I think you have ~2G of RAM and you're running with
/proc/sys/vm/dirty_ratio=10, yes?

If so, when that machine hits 10% * 2G of dirty memory then everyone who
wants to dirty pages gets blocked.


Oh, right. Thank you for the explanation.


and what is wrong with ext3+data=journal with =2.6.20-rc2?


Ah.  It has a bug in it ;)

As I said, data=journal has exceptional handling of pagecache data and is
not well tested.  Someone (and I'm not sure who) will need to get in there
and fix it.


OK, I'm willing to test it. ;)

Best regrds,

Krzysztof Olędzki

Re: [Bug 9182] Critical memory leak (dirty pages)

2007-12-16 Thread Krzysztof Oledzki



On Sun, 16 Dec 2007, Andrew Morton wrote:


On Sun, 16 Dec 2007 10:33:20 +0100 (CET) Krzysztof Oledzki <[EMAIL PROTECTED]> 
wrote:




On Sat, 15 Dec 2007, Andrew Morton wrote:


On Sun, 16 Dec 2007 00:08:52 +0100 (CET) Krzysztof Oledzki <[EMAIL PROTECTED]> 
wrote:




On Sat, 15 Dec 2007, [EMAIL PROTECTED] wrote:


http://bugzilla.kernel.org/show_bug.cgi?id=9182


--- Comment #33 from [EMAIL PROTECTED]  2007-12-15 14:19 ---
Krzysztof, I'd hate point you to a hard path (at least time consuming), but
you've done a lot of digging by now anyway. How about git bisecting between
2.6.20-rc2 and rc1? Here is great info on bisecting:
http://www.kernel.org/doc/local/git-quick.html


As I'm smarter than git-bistect I can tell that 2.6.20-rc1-git8 is as bad
as 2.6.20-rc2 but 2.6.20-rc1-git8 with one patch reverted seems to be OK.
So it took me only 2 reboots. ;)

The guilty patch is the one I proposed just an hour ago:
  
http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Fstable%2Flinux-2.6.20.y.git;a=commitdiff_plain;h=fba2591bf4e418b6c3f9f8794c9dd8fe40ae7bd9

So:
  - 2.6.20-rc1: OK
  - 2.6.20-rc1-git8 with fba2591bf4e418b6c3f9f8794c9dd8fe40ae7bd9 reverted: OK
  - 2.6.20-rc1-git8: very BAD
  - 2.6.20-rc2: very BAD
  - 2.6.20-rc4: very BAD
  - >= 2.6.20: BAD (but not *very* BAD!)



well..  We have code which has been used by *everyone* for a year and it's
misbehaving for you alone.


No, not for me alone. Probably only I and Thomas Osterried have systems
where it is so easy to reproduce. Please note that the problem exists on
my all systems, but only on one it is critical. It is enough to run
"sync; sleep 1; sunc; sleep 1; sync; grep Drirty /proc/meminfo" to be sure.
With =>2.6.20-rc1-git8 it *never* falls to 0 an *all* my hosts but only
on one it goes to ~200MB in about 2 weeks and then everything dies:
http://bugzilla.kernel.org/attachment.cgi?id=13824
http://bugzilla.kernel.org/attachment.cgi?id=13825
http://bugzilla.kernel.org/attachment.cgi?id=13826
http://bugzilla.kernel.org/attachment.cgi?id=13827


 I wonder what you're doing that is different/special.

Me to. :|


Which filesystem, which mount options


  - ext3 on RAID1 (MD): / - rootflags=data=journal


It wouldn't surprise me if this is specific to data=journal: that
journalling mode is pretty complex wrt dairty-data handling and isn't well
tested.

Does switching that to data=writeback change things?


I'll confirm this tomorrow but it seems that even switching to 
data=ordered (AFAIK default o ext3) is indeed enough to cure this problem.


Two questions remain then: why system dies when dirty reaches ~200MB 
and what is wrong with ext3+data=journal with >=2.6.20-rc2?


Best regards,

Krzysztof Olędzki

Re: [Bug 9182] Critical memory leak (dirty pages)

2007-12-16 Thread Krzysztof Oledzki



On Sun, 16 Dec 2007, [EMAIL PROTECTED] wrote:


http://bugzilla.kernel.org/show_bug.cgi?id=9182





--- Comment #39 from [EMAIL PROTECTED]  2007-12-16 01:58 ---


So:
  - 2.6.20-rc1: OK
  - 2.6.20-rc1-git8 with fba2591bf4e418b6c3f9f8794c9dd8fe40ae7bd9 reverted: OK
  - 2.6.20-rc1-git8: very BAD
  - 2.6.20-rc2: very BAD
  - 2.6.20-rc4: very BAD
  - >= 2.6.20: BAD (but not *very* BAD!)


based on the great info you already acquired, you should be able to
bisect this rather effectively, via:

2.6.20-rc1-git8 == 921320210bd2ec4f17053d283355b73048ac0e56

$ git-bisect start
$ git-bisect bad 921320210bd2ec4f17053d283355b73048ac0e56
$ git-bisect good v2.6.20-rc1
Bisecting: 133 revisions left to test after this

so about 7-8 bootups would pinpoint the breakage.


Except that I have very limited time where I can do my tests on this host. 
Please also note that it takes about ~2h after a reboot, to be 100% sure. 
So, 7-8 bootups => 14-16h. :|



It would likely pinpoint fba2591b, so it would perhaps be best to first
attempt a revert of fba2591b on a recent kernel.


I wish I could: :(

[EMAIL PROTECTED]:/usr/src/linux-2.6.23.9$ cat ..p1 |patch -p1 --dry-run -R
patching file fs/hugetlbfs/inode.c
Hunk #1 succeeded at 203 (offset 27 lines).
patching file include/linux/page-flags.h
Hunk #1 succeeded at 262 (offset 9 lines).
patching file mm/page-writeback.c
Hunk #1 succeeded at 903 (offset 58 lines).
patching file mm/truncate.c
Unreversed patch detected!  Ignore -R? [n] y
Hunk #1 succeeded at 52 with fuzz 2 (offset 1 line).
Hunk #2 FAILED at 85.
Hunk #3 FAILED at 365.
Hunk #4 FAILED at 400.
3 out of 4 hunks FAILED -- saving rejects to file mm/truncate.c.rej

Best regards,

Krzysztof Olędzki

Re: [Bug 9182] Critical memory leak (dirty pages)

2007-12-16 Thread Krzysztof Oledzki



On Sat, 15 Dec 2007, Andrew Morton wrote:


On Sun, 16 Dec 2007 00:08:52 +0100 (CET) Krzysztof Oledzki <[EMAIL PROTECTED]> 
wrote:




On Sat, 15 Dec 2007, [EMAIL PROTECTED] wrote:


http://bugzilla.kernel.org/show_bug.cgi?id=9182


--- Comment #33 from [EMAIL PROTECTED]  2007-12-15 14:19 ---
Krzysztof, I'd hate point you to a hard path (at least time consuming), but
you've done a lot of digging by now anyway. How about git bisecting between
2.6.20-rc2 and rc1? Here is great info on bisecting:
http://www.kernel.org/doc/local/git-quick.html


As I'm smarter than git-bistect I can tell that 2.6.20-rc1-git8 is as bad
as 2.6.20-rc2 but 2.6.20-rc1-git8 with one patch reverted seems to be OK.
So it took me only 2 reboots. ;)

The guilty patch is the one I proposed just an hour ago:
  
http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Fstable%2Flinux-2.6.20.y.git;a=commitdiff_plain;h=fba2591bf4e418b6c3f9f8794c9dd8fe40ae7bd9

So:
  - 2.6.20-rc1: OK
  - 2.6.20-rc1-git8 with fba2591bf4e418b6c3f9f8794c9dd8fe40ae7bd9 reverted: OK
  - 2.6.20-rc1-git8: very BAD
  - 2.6.20-rc2: very BAD
  - 2.6.20-rc4: very BAD
  - >= 2.6.20: BAD (but not *very* BAD!)



well..  We have code which has been used by *everyone* for a year and it's
misbehaving for you alone.


No, not for me alone. Probably only I and Thomas Osterried have systems 
where it is so easy to reproduce. Please note that the problem exists on 
my all systems, but only on one it is critical. It is enough to run
"sync; sleep 1; sunc; sleep 1; sync; grep Drirty /proc/meminfo" to be sure. 
With =>2.6.20-rc1-git8 it *never* falls to 0 an *all* my hosts but only 
on one it goes to ~200MB in about 2 weeks and then everything dies:

http://bugzilla.kernel.org/attachment.cgi?id=13824
http://bugzilla.kernel.org/attachment.cgi?id=13825
http://bugzilla.kernel.org/attachment.cgi?id=13826
http://bugzilla.kernel.org/attachment.cgi?id=13827


 I wonder what you're doing that is different/special.

Me to. :|


Which filesystem, which mount options


 - ext3 on RAID1 (MD): / - rootflags=data=journal
 - ext3 on LVM on RAID5 (MD)
 - nfs

/dev/md0 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec)
devpts on /dev/pts type devpts (rw,nosuid,noexec)
/dev/mapper/VolGrp0-usr on /usr type ext3 (rw,nodev,data=journal)
/dev/mapper/VolGrp0-var on /var type ext3 (rw,nodev,data=journal)
/dev/mapper/VolGrp0-squid_spool on /var/cache/squid/cd0 type ext3 
(rw,nosuid,nodev,noatime,data=writeback)
/dev/mapper/VolGrp0-squid_spool2 on /var/cache/squid/cd1 type ext3 
(rw,nosuid,nodev,noatime,data=writeback)
/dev/mapper/VolGrp0-news_spool on /var/spool/news type ext3 
(rw,nosuid,nodev,noatime)
shm on /dev/shm type tmpfs (rw,noexec,nosuid,nodev)
usbfs on /proc/bus/usb type usbfs (rw,noexec,nosuid,devmode=0664,devgid=85)
owl:/usr/gentoo-nfs on /usr/gentoo-nfs type nfs 
(ro,nosuid,nodev,noatime,bg,intr,tcp,addr=192.168.129.26)



what sort of workload?
Different, depending on a host: mail (postfix + amavisd + spamassasin + 
clamav + sqlgray), squid, mysql, apache, nfs, rsync,  But it seems 
that the biggest problem is on the host running mentioned mail service.


Thanks.

Best regards,

Krzysztof Olędzki

Re: [Bug 9182] Critical memory leak (dirty pages)

2007-12-16 Thread Krzysztof Oledzki



On Sat, 15 Dec 2007, Andrew Morton wrote:


On Sun, 16 Dec 2007 00:08:52 +0100 (CET) Krzysztof Oledzki [EMAIL PROTECTED] 
wrote:




On Sat, 15 Dec 2007, [EMAIL PROTECTED] wrote:


http://bugzilla.kernel.org/show_bug.cgi?id=9182


--- Comment #33 from [EMAIL PROTECTED]  2007-12-15 14:19 ---
Krzysztof, I'd hate point you to a hard path (at least time consuming), but
you've done a lot of digging by now anyway. How about git bisecting between
2.6.20-rc2 and rc1? Here is great info on bisecting:
http://www.kernel.org/doc/local/git-quick.html


As I'm smarter than git-bistect I can tell that 2.6.20-rc1-git8 is as bad
as 2.6.20-rc2 but 2.6.20-rc1-git8 with one patch reverted seems to be OK.
So it took me only 2 reboots. ;)

The guilty patch is the one I proposed just an hour ago:
  
http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Fstable%2Flinux-2.6.20.y.git;a=commitdiff_plain;h=fba2591bf4e418b6c3f9f8794c9dd8fe40ae7bd9

So:
  - 2.6.20-rc1: OK
  - 2.6.20-rc1-git8 with fba2591bf4e418b6c3f9f8794c9dd8fe40ae7bd9 reverted: OK
  - 2.6.20-rc1-git8: very BAD
  - 2.6.20-rc2: very BAD
  - 2.6.20-rc4: very BAD
  - = 2.6.20: BAD (but not *very* BAD!)



well..  We have code which has been used by *everyone* for a year and it's
misbehaving for you alone.


No, not for me alone. Probably only I and Thomas Osterried have systems 
where it is so easy to reproduce. Please note that the problem exists on 
my all systems, but only on one it is critical. It is enough to run
sync; sleep 1; sunc; sleep 1; sync; grep Drirty /proc/meminfo to be sure. 
With =2.6.20-rc1-git8 it *never* falls to 0 an *all* my hosts but only 
on one it goes to ~200MB in about 2 weeks and then everything dies:

http://bugzilla.kernel.org/attachment.cgi?id=13824
http://bugzilla.kernel.org/attachment.cgi?id=13825
http://bugzilla.kernel.org/attachment.cgi?id=13826
http://bugzilla.kernel.org/attachment.cgi?id=13827


 I wonder what you're doing that is different/special.

Me to. :|


Which filesystem, which mount options


 - ext3 on RAID1 (MD): / - rootflags=data=journal
 - ext3 on LVM on RAID5 (MD)
 - nfs

/dev/md0 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec)
devpts on /dev/pts type devpts (rw,nosuid,noexec)
/dev/mapper/VolGrp0-usr on /usr type ext3 (rw,nodev,data=journal)
/dev/mapper/VolGrp0-var on /var type ext3 (rw,nodev,data=journal)
/dev/mapper/VolGrp0-squid_spool on /var/cache/squid/cd0 type ext3 
(rw,nosuid,nodev,noatime,data=writeback)
/dev/mapper/VolGrp0-squid_spool2 on /var/cache/squid/cd1 type ext3 
(rw,nosuid,nodev,noatime,data=writeback)
/dev/mapper/VolGrp0-news_spool on /var/spool/news type ext3 
(rw,nosuid,nodev,noatime)
shm on /dev/shm type tmpfs (rw,noexec,nosuid,nodev)
usbfs on /proc/bus/usb type usbfs (rw,noexec,nosuid,devmode=0664,devgid=85)
owl:/usr/gentoo-nfs on /usr/gentoo-nfs type nfs 
(ro,nosuid,nodev,noatime,bg,intr,tcp,addr=192.168.129.26)



what sort of workload?
Different, depending on a host: mail (postfix + amavisd + spamassasin + 
clamav + sqlgray), squid, mysql, apache, nfs, rsync,  But it seems 
that the biggest problem is on the host running mentioned mail service.


Thanks.

Best regards,

Krzysztof Olędzki

Re: [Bug 9182] Critical memory leak (dirty pages)

2007-12-16 Thread Krzysztof Oledzki



On Sun, 16 Dec 2007, [EMAIL PROTECTED] wrote:


http://bugzilla.kernel.org/show_bug.cgi?id=9182





--- Comment #39 from [EMAIL PROTECTED]  2007-12-16 01:58 ---


So:
  - 2.6.20-rc1: OK
  - 2.6.20-rc1-git8 with fba2591bf4e418b6c3f9f8794c9dd8fe40ae7bd9 reverted: OK
  - 2.6.20-rc1-git8: very BAD
  - 2.6.20-rc2: very BAD
  - 2.6.20-rc4: very BAD
  - = 2.6.20: BAD (but not *very* BAD!)


based on the great info you already acquired, you should be able to
bisect this rather effectively, via:

2.6.20-rc1-git8 == 921320210bd2ec4f17053d283355b73048ac0e56

$ git-bisect start
$ git-bisect bad 921320210bd2ec4f17053d283355b73048ac0e56
$ git-bisect good v2.6.20-rc1
Bisecting: 133 revisions left to test after this

so about 7-8 bootups would pinpoint the breakage.


Except that I have very limited time where I can do my tests on this host. 
Please also note that it takes about ~2h after a reboot, to be 100% sure. 
So, 7-8 bootups = 14-16h. :|



It would likely pinpoint fba2591b, so it would perhaps be best to first
attempt a revert of fba2591b on a recent kernel.


I wish I could: :(

[EMAIL PROTECTED]:/usr/src/linux-2.6.23.9$ cat ..p1 |patch -p1 --dry-run -R
patching file fs/hugetlbfs/inode.c
Hunk #1 succeeded at 203 (offset 27 lines).
patching file include/linux/page-flags.h
Hunk #1 succeeded at 262 (offset 9 lines).
patching file mm/page-writeback.c
Hunk #1 succeeded at 903 (offset 58 lines).
patching file mm/truncate.c
Unreversed patch detected!  Ignore -R? [n] y
Hunk #1 succeeded at 52 with fuzz 2 (offset 1 line).
Hunk #2 FAILED at 85.
Hunk #3 FAILED at 365.
Hunk #4 FAILED at 400.
3 out of 4 hunks FAILED -- saving rejects to file mm/truncate.c.rej

Best regards,

Krzysztof Olędzki

Re: [Bug 9182] Critical memory leak (dirty pages)

2007-12-16 Thread Krzysztof Oledzki



On Sun, 16 Dec 2007, Andrew Morton wrote:


On Sun, 16 Dec 2007 10:33:20 +0100 (CET) Krzysztof Oledzki [EMAIL PROTECTED] 
wrote:




On Sat, 15 Dec 2007, Andrew Morton wrote:


On Sun, 16 Dec 2007 00:08:52 +0100 (CET) Krzysztof Oledzki [EMAIL PROTECTED] 
wrote:




On Sat, 15 Dec 2007, [EMAIL PROTECTED] wrote:


http://bugzilla.kernel.org/show_bug.cgi?id=9182


--- Comment #33 from [EMAIL PROTECTED]  2007-12-15 14:19 ---
Krzysztof, I'd hate point you to a hard path (at least time consuming), but
you've done a lot of digging by now anyway. How about git bisecting between
2.6.20-rc2 and rc1? Here is great info on bisecting:
http://www.kernel.org/doc/local/git-quick.html


As I'm smarter than git-bistect I can tell that 2.6.20-rc1-git8 is as bad
as 2.6.20-rc2 but 2.6.20-rc1-git8 with one patch reverted seems to be OK.
So it took me only 2 reboots. ;)

The guilty patch is the one I proposed just an hour ago:
  
http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Fstable%2Flinux-2.6.20.y.git;a=commitdiff_plain;h=fba2591bf4e418b6c3f9f8794c9dd8fe40ae7bd9

So:
  - 2.6.20-rc1: OK
  - 2.6.20-rc1-git8 with fba2591bf4e418b6c3f9f8794c9dd8fe40ae7bd9 reverted: OK
  - 2.6.20-rc1-git8: very BAD
  - 2.6.20-rc2: very BAD
  - 2.6.20-rc4: very BAD
  - = 2.6.20: BAD (but not *very* BAD!)



well..  We have code which has been used by *everyone* for a year and it's
misbehaving for you alone.


No, not for me alone. Probably only I and Thomas Osterried have systems
where it is so easy to reproduce. Please note that the problem exists on
my all systems, but only on one it is critical. It is enough to run
sync; sleep 1; sunc; sleep 1; sync; grep Drirty /proc/meminfo to be sure.
With =2.6.20-rc1-git8 it *never* falls to 0 an *all* my hosts but only
on one it goes to ~200MB in about 2 weeks and then everything dies:
http://bugzilla.kernel.org/attachment.cgi?id=13824
http://bugzilla.kernel.org/attachment.cgi?id=13825
http://bugzilla.kernel.org/attachment.cgi?id=13826
http://bugzilla.kernel.org/attachment.cgi?id=13827


 I wonder what you're doing that is different/special.

Me to. :|


Which filesystem, which mount options


  - ext3 on RAID1 (MD): / - rootflags=data=journal


It wouldn't surprise me if this is specific to data=journal: that
journalling mode is pretty complex wrt dairty-data handling and isn't well
tested.

Does switching that to data=writeback change things?


I'll confirm this tomorrow but it seems that even switching to 
data=ordered (AFAIK default o ext3) is indeed enough to cure this problem.


Two questions remain then: why system dies when dirty reaches ~200MB 
and what is wrong with ext3+data=journal with =2.6.20-rc2?


Best regards,

Krzysztof Olędzki

Re: [Bug 9182] Critical memory leak (dirty pages)

2007-12-15 Thread Krzysztof Oledzki



On Sat, 15 Dec 2007, [EMAIL PROTECTED] wrote:


http://bugzilla.kernel.org/show_bug.cgi?id=9182


--- Comment #33 from [EMAIL PROTECTED]  2007-12-15 14:19 ---
Krzysztof, I'd hate point you to a hard path (at least time consuming), but
you've done a lot of digging by now anyway. How about git bisecting between
2.6.20-rc2 and rc1? Here is great info on bisecting:
http://www.kernel.org/doc/local/git-quick.html


As I'm smarter than git-bistect I can tell that 2.6.20-rc1-git8 is as bad 
as 2.6.20-rc2 but 2.6.20-rc1-git8 with one patch reverted seems to be OK. 
So it took me only 2 reboots. ;)


The guilty patch is the one I proposed just an hour ago:
 
http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Fstable%2Flinux-2.6.20.y.git;a=commitdiff_plain;h=fba2591bf4e418b6c3f9f8794c9dd8fe40ae7bd9

So:
 - 2.6.20-rc1: OK
 - 2.6.20-rc1-git8 with fba2591bf4e418b6c3f9f8794c9dd8fe40ae7bd9 reverted: OK
 - 2.6.20-rc1-git8: very BAD
 - 2.6.20-rc2: very BAD
 - 2.6.20-rc4: very BAD
 - >= 2.6.20: BAD (but not *very* BAD!)

Best regards,

Krzysztof Olędzki

Re: [Bug 9182] Critical memory leak (dirty pages)

2007-12-15 Thread Krzysztof Oledzki


http://bugzilla.kernel.org/show_bug.cgi?id=9182


On Sat, 15 Dec 2007, Krzysztof Oledzki wrote:




On Thu, 13 Dec 2007, Krzysztof Oledzki wrote:




On Thu, 13 Dec 2007, Peter Zijlstra wrote:



On Thu, 2007-12-13 at 16:17 +0100, Krzysztof Oledzki wrote:





BTW: Could someone please look at this problem? I feel little ignored and
in my situation this is a critical regression.


I was hoping to get around to it today, but I guess tomorrow will have
to do :-/


Thanks.


So, its ext3, dirty some pages, sync, and dirty doesn't fall to 0,
right?


Not only doesn't fall but continuously grows.


Does it happen with other filesystems as well?


Don't know. I generally only use ext3 and I'm afraid I'm not able to switch 
this system to other filesystem.



What are you ext3 mount options?

/dev/root / ext3 rw,data=journal 0 0
/dev/VolGrp0/usr /usr ext3 rw,nodev,data=journal 0 0
/dev/VolGrp0/var /var ext3 rw,nodev,data=journal 0 0
/dev/VolGrp0/squid_spool /var/cache/squid/cd0 ext3 
rw,nosuid,nodev,noatime,data=writeback 0 0
/dev/VolGrp0/squid_spool2 /var/cache/squid/cd1 ext3 
rw,nosuid,nodev,noatime,data=writeback 0 0
/dev/VolGrp0/news_spool /var/spool/news ext3 
rw,nosuid,nodev,noatime,data=ordered 0 0


BTW: this regression also exists in 2.6.24-rc5. I'll try to find when it was 
introduced but it is hard to do it on a highly critical production system, 
especially since it takes ~2h after a reboot, to be sure.


However, 2h is quite good time, on other systems I have to wait ~2 months to 
get 20MB of leaked memory:


# uptime
13:29:34 up 58 days, 13:04,  9 users,  load average: 0.38, 0.27, 0.31

# sync;sync;sleep 1;sync;grep Dirt /proc/meminfo
Dirty:   23820 kB


More news, I hope this time my problem get more attention from developers 
since now I have much more information.


So far I found that:
 - 2.6.20-rc4 - bad: http://bugzilla.kernel.org/attachment.cgi?id=14057
 - 2.6.20-rc2 - bad: http://bugzilla.kernel.org/attachment.cgi?id=14058
 - 2.6.20-rc1 - OK (probably, I need to wait little more to be 100% sure).

2.6.20-rc1 with 33m uptime:
~$ grep Dirt /proc/meminfo ;sync ; sleep 1 ; sync ; grep Dirt /proc/meminfo
Dirty:   10504 kB
Dirty:   0 kB

2.6.20-rc2 was released Dec 23/24 2006 (BAD)
2.6.20-rc1 was released Dec 13/14 2006 (GOOD?)

It seems that this bug was introduced exactly one year ago. Surprisingly, 
dirty memory in 2.6.20-rc2/2.6.20-rc4 leaks _much_ more faster than in 
2.6.20-final and later kernels as it took only about 6h to reach 172MB. 
So, this bug might be cured afterward, but only a little.


There are three commits that may be somehow related:

http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.20.y.git;a=commitdiff;h=fba2591bf4e418b6c3f9f8794c9dd8fe40ae7bd9
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.20.y.git;a=commitdiff;h=3e67c0987d7567ad41164a153dca9a43b11d
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.20.y.git;a=commitdiff;h=5f2a105d5e33a038a717995d2738434f9c25aed2

I'm going to check 2.6.20-rc1-git... releases but it would be *very* nice 
if someone could finally give ma a hand and point some hints helping 
debugging this problem.


Please note that none of my systems with kernels >= 2.6.20-rc1 is able to 
reach 0 kb of dirty memory, even after many synces, even when idle.


Best regards,

Krzysztof Olędzki

Re: [Bug 9182] Critical memory leak (dirty pages)

2007-12-15 Thread Krzysztof Oledzki



On Thu, 13 Dec 2007, Krzysztof Oledzki wrote:




On Thu, 13 Dec 2007, Peter Zijlstra wrote:



On Thu, 2007-12-13 at 16:17 +0100, Krzysztof Oledzki wrote:





BTW: Could someone please look at this problem? I feel little ignored and
in my situation this is a critical regression.


I was hoping to get around to it today, but I guess tomorrow will have
to do :-/


Thanks.


So, its ext3, dirty some pages, sync, and dirty doesn't fall to 0,
right?


Not only doesn't fall but continuously grows.


Does it happen with other filesystems as well?


Don't know. I generally only use ext3 and I'm afraid I'm not able to switch 
this system to other filesystem.



What are you ext3 mount options?

/dev/root / ext3 rw,data=journal 0 0
/dev/VolGrp0/usr /usr ext3 rw,nodev,data=journal 0 0
/dev/VolGrp0/var /var ext3 rw,nodev,data=journal 0 0
/dev/VolGrp0/squid_spool /var/cache/squid/cd0 ext3 
rw,nosuid,nodev,noatime,data=writeback 0 0
/dev/VolGrp0/squid_spool2 /var/cache/squid/cd1 ext3 
rw,nosuid,nodev,noatime,data=writeback 0 0
/dev/VolGrp0/news_spool /var/spool/news ext3 
rw,nosuid,nodev,noatime,data=ordered 0 0


BTW: this regression also exists in 2.6.24-rc5. I'll try to find when it 
was introduced but it is hard to do it on a highly critical production 
system, especially since it takes ~2h after a reboot, to be sure.


However, 2h is quite good time, on other systems I have to wait ~2 months 
to get 20MB of leaked memory:


# uptime
 13:29:34 up 58 days, 13:04,  9 users,  load average: 0.38, 0.27, 0.31

# sync;sync;sleep 1;sync;grep Dirt /proc/meminfo
Dirty:   23820 kB

Best regards,

Krzysztof Olędzki

Re: [Bug 9182] Critical memory leak (dirty pages)

2007-12-15 Thread Krzysztof Oledzki



On Thu, 13 Dec 2007, Krzysztof Oledzki wrote:




On Thu, 13 Dec 2007, Peter Zijlstra wrote:



On Thu, 2007-12-13 at 16:17 +0100, Krzysztof Oledzki wrote:





BTW: Could someone please look at this problem? I feel little ignored and
in my situation this is a critical regression.


I was hoping to get around to it today, but I guess tomorrow will have
to do :-/


Thanks.


So, its ext3, dirty some pages, sync, and dirty doesn't fall to 0,
right?


Not only doesn't fall but continuously grows.


Does it happen with other filesystems as well?


Don't know. I generally only use ext3 and I'm afraid I'm not able to switch 
this system to other filesystem.



What are you ext3 mount options?

/dev/root / ext3 rw,data=journal 0 0
/dev/VolGrp0/usr /usr ext3 rw,nodev,data=journal 0 0
/dev/VolGrp0/var /var ext3 rw,nodev,data=journal 0 0
/dev/VolGrp0/squid_spool /var/cache/squid/cd0 ext3 
rw,nosuid,nodev,noatime,data=writeback 0 0
/dev/VolGrp0/squid_spool2 /var/cache/squid/cd1 ext3 
rw,nosuid,nodev,noatime,data=writeback 0 0
/dev/VolGrp0/news_spool /var/spool/news ext3 
rw,nosuid,nodev,noatime,data=ordered 0 0


BTW: this regression also exists in 2.6.24-rc5. I'll try to find when it 
was introduced but it is hard to do it on a highly critical production 
system, especially since it takes ~2h after a reboot, to be sure.


However, 2h is quite good time, on other systems I have to wait ~2 months 
to get 20MB of leaked memory:


# uptime
 13:29:34 up 58 days, 13:04,  9 users,  load average: 0.38, 0.27, 0.31

# sync;sync;sleep 1;sync;grep Dirt /proc/meminfo
Dirty:   23820 kB

Best regards,

Krzysztof Olędzki

Re: [Bug 9182] Critical memory leak (dirty pages)

2007-12-15 Thread Krzysztof Oledzki


http://bugzilla.kernel.org/show_bug.cgi?id=9182


On Sat, 15 Dec 2007, Krzysztof Oledzki wrote:




On Thu, 13 Dec 2007, Krzysztof Oledzki wrote:




On Thu, 13 Dec 2007, Peter Zijlstra wrote:



On Thu, 2007-12-13 at 16:17 +0100, Krzysztof Oledzki wrote:





BTW: Could someone please look at this problem? I feel little ignored and
in my situation this is a critical regression.


I was hoping to get around to it today, but I guess tomorrow will have
to do :-/


Thanks.


So, its ext3, dirty some pages, sync, and dirty doesn't fall to 0,
right?


Not only doesn't fall but continuously grows.


Does it happen with other filesystems as well?


Don't know. I generally only use ext3 and I'm afraid I'm not able to switch 
this system to other filesystem.



What are you ext3 mount options?

/dev/root / ext3 rw,data=journal 0 0
/dev/VolGrp0/usr /usr ext3 rw,nodev,data=journal 0 0
/dev/VolGrp0/var /var ext3 rw,nodev,data=journal 0 0
/dev/VolGrp0/squid_spool /var/cache/squid/cd0 ext3 
rw,nosuid,nodev,noatime,data=writeback 0 0
/dev/VolGrp0/squid_spool2 /var/cache/squid/cd1 ext3 
rw,nosuid,nodev,noatime,data=writeback 0 0
/dev/VolGrp0/news_spool /var/spool/news ext3 
rw,nosuid,nodev,noatime,data=ordered 0 0


BTW: this regression also exists in 2.6.24-rc5. I'll try to find when it was 
introduced but it is hard to do it on a highly critical production system, 
especially since it takes ~2h after a reboot, to be sure.


However, 2h is quite good time, on other systems I have to wait ~2 months to 
get 20MB of leaked memory:


# uptime
13:29:34 up 58 days, 13:04,  9 users,  load average: 0.38, 0.27, 0.31

# sync;sync;sleep 1;sync;grep Dirt /proc/meminfo
Dirty:   23820 kB


More news, I hope this time my problem get more attention from developers 
since now I have much more information.


So far I found that:
 - 2.6.20-rc4 - bad: http://bugzilla.kernel.org/attachment.cgi?id=14057
 - 2.6.20-rc2 - bad: http://bugzilla.kernel.org/attachment.cgi?id=14058
 - 2.6.20-rc1 - OK (probably, I need to wait little more to be 100% sure).

2.6.20-rc1 with 33m uptime:
~$ grep Dirt /proc/meminfo ;sync ; sleep 1 ; sync ; grep Dirt /proc/meminfo
Dirty:   10504 kB
Dirty:   0 kB

2.6.20-rc2 was released Dec 23/24 2006 (BAD)
2.6.20-rc1 was released Dec 13/14 2006 (GOOD?)

It seems that this bug was introduced exactly one year ago. Surprisingly, 
dirty memory in 2.6.20-rc2/2.6.20-rc4 leaks _much_ more faster than in 
2.6.20-final and later kernels as it took only about 6h to reach 172MB. 
So, this bug might be cured afterward, but only a little.


There are three commits that may be somehow related:

http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.20.y.git;a=commitdiff;h=fba2591bf4e418b6c3f9f8794c9dd8fe40ae7bd9
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.20.y.git;a=commitdiff;h=3e67c0987d7567ad41164a153dca9a43b11d
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.20.y.git;a=commitdiff;h=5f2a105d5e33a038a717995d2738434f9c25aed2

I'm going to check 2.6.20-rc1-git... releases but it would be *very* nice 
if someone could finally give ma a hand and point some hints helping 
debugging this problem.


Please note that none of my systems with kernels = 2.6.20-rc1 is able to 
reach 0 kb of dirty memory, even after many synces, even when idle.


Best regards,

Krzysztof Olędzki

Re: [Bug 9182] Critical memory leak (dirty pages)

2007-12-15 Thread Krzysztof Oledzki



On Sat, 15 Dec 2007, [EMAIL PROTECTED] wrote:


http://bugzilla.kernel.org/show_bug.cgi?id=9182


--- Comment #33 from [EMAIL PROTECTED]  2007-12-15 14:19 ---
Krzysztof, I'd hate point you to a hard path (at least time consuming), but
you've done a lot of digging by now anyway. How about git bisecting between
2.6.20-rc2 and rc1? Here is great info on bisecting:
http://www.kernel.org/doc/local/git-quick.html


As I'm smarter than git-bistect I can tell that 2.6.20-rc1-git8 is as bad 
as 2.6.20-rc2 but 2.6.20-rc1-git8 with one patch reverted seems to be OK. 
So it took me only 2 reboots. ;)


The guilty patch is the one I proposed just an hour ago:
 
http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Fstable%2Flinux-2.6.20.y.git;a=commitdiff_plain;h=fba2591bf4e418b6c3f9f8794c9dd8fe40ae7bd9

So:
 - 2.6.20-rc1: OK
 - 2.6.20-rc1-git8 with fba2591bf4e418b6c3f9f8794c9dd8fe40ae7bd9 reverted: OK
 - 2.6.20-rc1-git8: very BAD
 - 2.6.20-rc2: very BAD
 - 2.6.20-rc4: very BAD
 - = 2.6.20: BAD (but not *very* BAD!)

Best regards,

Krzysztof Olędzki

Re: [Bug 9182] Critical memory leak (dirty pages)

2007-12-13 Thread Krzysztof Oledzki



On Thu, 13 Dec 2007, Peter Zijlstra wrote:



On Thu, 2007-12-13 at 16:17 +0100, Krzysztof Oledzki wrote:





BTW: Could someone please look at this problem? I feel little ignored and
in my situation this is a critical regression.


I was hoping to get around to it today, but I guess tomorrow will have
to do :-/


Thanks.


So, its ext3, dirty some pages, sync, and dirty doesn't fall to 0,
right?


Not only doesn't fall but continuously grows.


Does it happen with other filesystems as well?


Don't know. I generally only use ext3 and I'm afraid I'm not able to 
switch this system to other filesystem.



What are you ext3 mount options?

/dev/root / ext3 rw,data=journal 0 0
/dev/VolGrp0/usr /usr ext3 rw,nodev,data=journal 0 0
/dev/VolGrp0/var /var ext3 rw,nodev,data=journal 0 0
/dev/VolGrp0/squid_spool /var/cache/squid/cd0 ext3 
rw,nosuid,nodev,noatime,data=writeback 0 0
/dev/VolGrp0/squid_spool2 /var/cache/squid/cd1 ext3 
rw,nosuid,nodev,noatime,data=writeback 0 0
/dev/VolGrp0/news_spool /var/spool/news ext3 
rw,nosuid,nodev,noatime,data=ordered 0 0

Best regards,

Krzysztof Olędzki

Re: [Bug 9182] Critical memory leak (dirty pages)

2007-12-13 Thread Krzysztof Oledzki



On Mon, 3 Dec 2007, Thomas Osterried wrote:


On the machine which has troubles, the bug occured within about 10 days
During these days, the amount of dirty pages increased, up to 400MB.
I have testet kernel 2.6.19, 2.6.20, 2.6.22.1 and 2.6.22.10 (with our config),
and even linux-2.6.20 from ubuntu-sever. They have all shown that behaviour.





10 days ago, i installed kernel 2.6.18.5 on this machine (with backported
3ware controller code). I'm quite sure that this kernel will now fixes our
severe stability problems on this production machine (currently:
Dirty:  472 kB, nr_dirty 118).
If so, it's the "lastest" kernel i found usable, after half of a year of pain.


Strange, my tests show that both 2.6.18(.8) and 2.6.19(.7) are OK and the 
first wrong kernel is 2.6.20.


BTW: Could someone please look at this problem? I feel little ignored and 
in my situation this is a critical regression.


Best regards,

Krzysztof Olędzki

Re: [Bug 9182] Critical memory leak (dirty pages)

2007-12-13 Thread Krzysztof Oledzki



On Mon, 3 Dec 2007, Thomas Osterried wrote:


On the machine which has troubles, the bug occured within about 10 days
During these days, the amount of dirty pages increased, up to 400MB.
I have testet kernel 2.6.19, 2.6.20, 2.6.22.1 and 2.6.22.10 (with our config),
and even linux-2.6.20 from ubuntu-sever. They have all shown that behaviour.


CUT


10 days ago, i installed kernel 2.6.18.5 on this machine (with backported
3ware controller code). I'm quite sure that this kernel will now fixes our
severe stability problems on this production machine (currently:
Dirty:  472 kB, nr_dirty 118).
If so, it's the lastest kernel i found usable, after half of a year of pain.


Strange, my tests show that both 2.6.18(.8) and 2.6.19(.7) are OK and the 
first wrong kernel is 2.6.20.


BTW: Could someone please look at this problem? I feel little ignored and 
in my situation this is a critical regression.


Best regards,

Krzysztof Olędzki

Re: [Bug 9182] Critical memory leak (dirty pages)

2007-12-13 Thread Krzysztof Oledzki



On Thu, 13 Dec 2007, Peter Zijlstra wrote:



On Thu, 2007-12-13 at 16:17 +0100, Krzysztof Oledzki wrote:





BTW: Could someone please look at this problem? I feel little ignored and
in my situation this is a critical regression.


I was hoping to get around to it today, but I guess tomorrow will have
to do :-/


Thanks.


So, its ext3, dirty some pages, sync, and dirty doesn't fall to 0,
right?


Not only doesn't fall but continuously grows.


Does it happen with other filesystems as well?


Don't know. I generally only use ext3 and I'm afraid I'm not able to 
switch this system to other filesystem.



What are you ext3 mount options?

/dev/root / ext3 rw,data=journal 0 0
/dev/VolGrp0/usr /usr ext3 rw,nodev,data=journal 0 0
/dev/VolGrp0/var /var ext3 rw,nodev,data=journal 0 0
/dev/VolGrp0/squid_spool /var/cache/squid/cd0 ext3 
rw,nosuid,nodev,noatime,data=writeback 0 0
/dev/VolGrp0/squid_spool2 /var/cache/squid/cd1 ext3 
rw,nosuid,nodev,noatime,data=writeback 0 0
/dev/VolGrp0/news_spool /var/spool/news ext3 
rw,nosuid,nodev,noatime,data=ordered 0 0

Best regards,

Krzysztof Olędzki

Re: [Bug 9182] Critical memory leak (dirty pages)

2007-12-12 Thread Krzysztof Oledzki



On Tue, 11 Dec 2007, Krzysztof Oledzki wrote:




On Wed, 5 Dec 2007, Krzysztof Oledzki wrote:




On Wed, 5 Dec 2007, [EMAIL PROTECTED] wrote:


http://bugzilla.kernel.org/show_bug.cgi?id=9182


--- Comment #20 from [EMAIL PROTECTED]  2007-12-05 13:37 ---
Please monitor the "Dirty:" record in /proc/meminfo.  Is it slowly rising
and never falling?


It is slowly rising with respect to a small fluctuation caused by a current 
load.



Does it then fall if you run /bin/sync?
Only a little, by ~1-2MB like in a normal system. But it is not able to 
fall below a local minimum. So, after a first sync it does not fall more 
with additional synces.



Compile up usemem.c from
http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz and run

usemem -m 

where N is the number of megabytes whcih that machine has.


It has 2GB but:

# ./usemem -m 1662 ; echo $?
0

# ./usemem -m 1663 ; echo $?
./usemem: mmap failed: Cannot allocate memory
1


 Did this cause /proc/meminfo:Dirty to fall?

No.


OK, I booted a kernel without 2:2 memsplit but instead with a standard 
3.1:0.9 and even without highmem. So, now I have ~900MB and I am able to set 
-m to the number of megabytes which themachine has. However, usemem still 
does does not cause dirty memory usage to fall. :(


OK, I can confirm that this is a regression from 2.6.18 where it works OK:

[EMAIL PROTECTED]:~$ uname -r
2.6.18.8

[EMAIL PROTECTED]:~$ uptime;grep Dirt /proc/meminfo;sync;sleep 2;sync;sleep 
1;sync;grep Dirt /proc/meminfo
 14:21:53 up  1:00,  1 user,  load average: 0.23, 0.36, 0.35
Dirty: 376 kB
Dirty:   0 kB

It seems that this leak also exists in my other system as even after many 
synces number of dirty pages are still >> 0, but this the only one where 
it is so critical and at the same time - so easy to reproduce.


Best regards,


Krzysztof Olędzki

Re: [Bug 9182] Critical memory leak (dirty pages)

2007-12-12 Thread Krzysztof Oledzki



On Tue, 11 Dec 2007, Krzysztof Oledzki wrote:




On Wed, 5 Dec 2007, Krzysztof Oledzki wrote:




On Wed, 5 Dec 2007, [EMAIL PROTECTED] wrote:


http://bugzilla.kernel.org/show_bug.cgi?id=9182


--- Comment #20 from [EMAIL PROTECTED]  2007-12-05 13:37 ---
Please monitor the Dirty: record in /proc/meminfo.  Is it slowly rising
and never falling?


It is slowly rising with respect to a small fluctuation caused by a current 
load.



Does it then fall if you run /bin/sync?
Only a little, by ~1-2MB like in a normal system. But it is not able to 
fall below a local minimum. So, after a first sync it does not fall more 
with additional synces.



Compile up usemem.c from
http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz and run

usemem -m N

where N is the number of megabytes whcih that machine has.


It has 2GB but:

# ./usemem -m 1662 ; echo $?
0

# ./usemem -m 1663 ; echo $?
./usemem: mmap failed: Cannot allocate memory
1


 Did this cause /proc/meminfo:Dirty to fall?

No.


OK, I booted a kernel without 2:2 memsplit but instead with a standard 
3.1:0.9 and even without highmem. So, now I have ~900MB and I am able to set 
-m to the number of megabytes which themachine has. However, usemem still 
does does not cause dirty memory usage to fall. :(


OK, I can confirm that this is a regression from 2.6.18 where it works OK:

[EMAIL PROTECTED]:~$ uname -r
2.6.18.8

[EMAIL PROTECTED]:~$ uptime;grep Dirt /proc/meminfo;sync;sleep 2;sync;sleep 
1;sync;grep Dirt /proc/meminfo
 14:21:53 up  1:00,  1 user,  load average: 0.23, 0.36, 0.35
Dirty: 376 kB
Dirty:   0 kB

It seems that this leak also exists in my other system as even after many 
synces number of dirty pages are still  0, but this the only one where 
it is so critical and at the same time - so easy to reproduce.


Best regards,


Krzysztof Olędzki

Re: [Bug 9182] Critical memory leak (dirty pages)

2007-12-05 Thread Krzysztof Oledzki



On Wed, 5 Dec 2007, [EMAIL PROTECTED] wrote:


http://bugzilla.kernel.org/show_bug.cgi?id=9182


[EMAIL PROTECTED] changed:

  What|Removed |Added

 Component|Other   |Other
 KernelVersion|2.6.22-stable/2.6.23-stable |2.6.20-stable/2.6.22-
  ||stable/2.6.23-stable
   Product|IO/Storage  |Memory Management
Regression|0   |1
   Summary|Strange system hangs|Critical memory leak (dirty
  ||pages)



After additional hint from Thomas Osterried I can confirm that the problem 
I have been dealing with for half of a year comes from continuous dirty 
pages increas:


http://bugzilla.kernel.org/attachment.cgi?id=13864=view (in 1 KB 
units)


So, after two days of uptime I have ~140MB of dirty pages and that 
explains why my system crashes every 2-3 weeks.


Best regards,


Krzysztof Olędzki

Re: [Bug 9182] Critical memory leak (dirty pages)

2007-12-05 Thread Krzysztof Oledzki



On Wed, 5 Dec 2007, [EMAIL PROTECTED] wrote:


http://bugzilla.kernel.org/show_bug.cgi?id=9182


[EMAIL PROTECTED] changed:

  What|Removed |Added

 Component|Other   |Other
 KernelVersion|2.6.22-stable/2.6.23-stable |2.6.20-stable/2.6.22-
  ||stable/2.6.23-stable
   Product|IO/Storage  |Memory Management
Regression|0   |1
   Summary|Strange system hangs|Critical memory leak (dirty
  ||pages)



After additional hint from Thomas Osterried I can confirm that the problem 
I have been dealing with for half of a year comes from continuous dirty 
pages increas:


http://bugzilla.kernel.org/attachment.cgi?id=13864action=view (in 1 KB 
units)


So, after two days of uptime I have ~140MB of dirty pages and that 
explains why my system crashes every 2-3 weeks.


Best regards,


Krzysztof Olędzki

Re: Strange system hangs

2007-12-02 Thread Krzysztof Oledzki



On Sat, 29 Sep 2007, Nick Piggin wrote:


On Friday 28 September 2007 18:42, Krzysztof Oledzki wrote:

Hello,

I am experiencing weird system hangs. Once about 2-5 weeks system freezes
and stops accepting remote connections, so it is no longer possible to
connect to most important services: smtp (postfix), www (squid) or even
ssh. Such connection is accepted but then it hangs.

What is strange, that previously established ssh session is usable. It is
possible to work on such system until you do something stupid like "less
/var/log/all.log". Using strace I found that process blocks on:


Is this a regression? If so, what's the most recent kernel that didn't show
the problem?

The symptoms could be consistent with some place doing a
balance_dirty_pages while holding a lock that is required for IO, but I can't
see a smoking gun (you've got contention on i_mutex, but that should be
OK).

Can you see if there is any memory under writeback that isn't being
completed (sysrq+M), also a list the locks held after the hang might be
helpful (compile in lockdep and sysrq+D)

Is anything currently running? (sysrq+P and even a full sysrq+T task list
could be useful).

Are any IO errors occurring at all?


It seems that 2.6.23.x still fails but somehow different. I updated my 
bugreport at: http://bugzilla.kernel.org/show_bug.cgi?id=9182. There are 
new attachments with traces and an oops that happened while I was taking 
the debugging data.


Thank you.

Best regards,


Krzysztof Olędzki

Re: Strange system hangs

2007-12-02 Thread Krzysztof Oledzki



On Sat, 29 Sep 2007, Nick Piggin wrote:


On Friday 28 September 2007 18:42, Krzysztof Oledzki wrote:

Hello,

I am experiencing weird system hangs. Once about 2-5 weeks system freezes
and stops accepting remote connections, so it is no longer possible to
connect to most important services: smtp (postfix), www (squid) or even
ssh. Such connection is accepted but then it hangs.

What is strange, that previously established ssh session is usable. It is
possible to work on such system until you do something stupid like less
/var/log/all.log. Using strace I found that process blocks on:


Is this a regression? If so, what's the most recent kernel that didn't show
the problem?

The symptoms could be consistent with some place doing a
balance_dirty_pages while holding a lock that is required for IO, but I can't
see a smoking gun (you've got contention on i_mutex, but that should be
OK).

Can you see if there is any memory under writeback that isn't being
completed (sysrq+M), also a list the locks held after the hang might be
helpful (compile in lockdep and sysrq+D)

Is anything currently running? (sysrq+P and even a full sysrq+T task list
could be useful).

Are any IO errors occurring at all?


It seems that 2.6.23.x still fails but somehow different. I updated my 
bugreport at: http://bugzilla.kernel.org/show_bug.cgi?id=9182. There are 
new attachments with traces and an oops that happened while I was taking 
the debugging data.


Thank you.

Best regards,


Krzysztof Olędzki

Re: Strange system hangs

2007-09-29 Thread Krzysztof Oledzki



On Sat, 29 Sep 2007, Nick Piggin wrote:


On Friday 28 September 2007 18:42, Krzysztof Oledzki wrote:

Hello,

I am experiencing weird system hangs. Once about 2-5 weeks system freezes
and stops accepting remote connections, so it is no longer possible to
connect to most important services: smtp (postfix), www (squid) or even
ssh. Such connection is accepted but then it hangs.

What is strange, that previously established ssh session is usable. It is
possible to work on such system until you do something stupid like "less
/var/log/all.log". Using strace I found that process blocks on:


Is this a regression? If so, what's the most recent kernel that didn't show
the problem?


I don't know. First kernel I ran was 2.6.20.x. This is quite fresh system.


The symptoms could be consistent with some place doing a
balance_dirty_pages while holding a lock that is required for IO, but I can't
see a smoking gun (you've got contention on i_mutex, but that should be
OK).

Can you see if there is any memory under writeback that isn't being
completed (sysrq+M), also a list the locks held after the hang might be
helpful (compile in lockdep and sysrq+D)


OK. I'll try to do it next time if there will be a chance. It may take 
some time, BTW.



Is anything currently running? (sysrq+P and even a full sysrq+T task list
could be useful).


I'll have to check - maybe I have this captured. If not I'll check it next 
time.



Are any IO errors occurring at all?


Didn't notice - so no.

Thank you.

Best regards,


Krzysztof Olędzki

Re: Strange system hangs

2007-09-29 Thread Krzysztof Oledzki



On Sat, 29 Sep 2007, Nick Piggin wrote:


On Friday 28 September 2007 18:42, Krzysztof Oledzki wrote:

Hello,

I am experiencing weird system hangs. Once about 2-5 weeks system freezes
and stops accepting remote connections, so it is no longer possible to
connect to most important services: smtp (postfix), www (squid) or even
ssh. Such connection is accepted but then it hangs.

What is strange, that previously established ssh session is usable. It is
possible to work on such system until you do something stupid like less
/var/log/all.log. Using strace I found that process blocks on:


Is this a regression? If so, what's the most recent kernel that didn't show
the problem?


I don't know. First kernel I ran was 2.6.20.x. This is quite fresh system.


The symptoms could be consistent with some place doing a
balance_dirty_pages while holding a lock that is required for IO, but I can't
see a smoking gun (you've got contention on i_mutex, but that should be
OK).

Can you see if there is any memory under writeback that isn't being
completed (sysrq+M), also a list the locks held after the hang might be
helpful (compile in lockdep and sysrq+D)


OK. I'll try to do it next time if there will be a chance. It may take 
some time, BTW.



Is anything currently running? (sysrq+P and even a full sysrq+T task list
could be useful).


I'll have to check - maybe I have this captured. If not I'll check it next 
time.



Are any IO errors occurring at all?


Didn't notice - so no.

Thank you.

Best regards,


Krzysztof Olędzki

Re: Strange system hangs

2007-09-28 Thread Krzysztof Oledzki



On Fri, 28 Sep 2007, Peter Zijlstra wrote:


On Fri, 2007-09-28 at 10:42 +0200, Krzysztof Oledzki wrote:

Hello,

I am experiencing weird system hangs. Once about 2-5 weeks system freezes
and stops accepting remote connections, so it is no longer possible to
connect to most important services: smtp (postfix), www (squid) or even
ssh. Such connection is accepted but then it hangs.

What is strange, that previously established ssh session is usable. It is
possible to work on such system until you do something stupid like "less
/var/log/all.log".


So it takes weeks to reproduce this?


Unfortunately, yes. :(


  freesibling
   task PCstack   pid father child younger older
syslogd   D F5C83C60 0  2162  1 (NOTLB)
f5c83c74 0082 0002 f5c83c60 f5c83c5c   78538d20
0009 0001 f7f6a070 f7cb8030 82c47e5f 0001cfed 0a43 f7f6a17c
7a016980 f705dc80 78404217 7812c708  0213 f5c83c84 1e7a64bb
Call Trace:
  [<78404217>] _spin_unlock_irqrestore+0xf/0x23
  [<7812c708>] __mod_timer+0x92/0x9c
  [<78402b34>] schedule_timeout+0x70/0x8d
  [<7812c521>] process_timeout+0x0/0x5
  [<78402548>] io_schedule_timeout+0x1e/0x28
  [<7814d41e>] congestion_wait+0x50/0x64
  [<78134abc>] autoremove_wake_function+0x0/0x35
  [<781493e7>] balance_dirty_pages_ratelimited_nr+0x16e/0x1dc
  [<78145bd0>] generic_file_buffered_write+0x4ee/0x605
  [<783c55a1>] unix_dgram_recvmsg+0x1b4/0x1c8
  [<78128c8e>] current_fs_time+0x41/0x46
  [<78146167>] __generic_file_aio_write_nolock+0x480/0x4df
  [<7814621b>] generic_file_aio_write+0x55/0xb3
  [<78194b28>] ext3_file_write+0x24/0x8f
  [<7815f34f>] do_sync_readv_writev+0xc1/0xfe
  [<78134abc>] autoremove_wake_function+0x0/0x35
  [<784041ae>] _spin_unlock+0xd/0x21
  [<781a8c38>] log_wait_commit+0xc3/0xe3
  [<7814448b>] find_get_pages_tag+0x76/0x80
  [<7815f204>] rw_copy_check_uvector+0x50/0xaa
  [<7815f9d4>] do_readv_writev+0x99/0x164
  [<78194b04>] ext3_file_write+0x0/0x8f
  [<7815fadc>] vfs_writev+0x3d/0x48
  [<7815feb5>] sys_writev+0x41/0x67
  [<78103d6a>] sysenter_past_esp+0x5f/0x85
  ===


This trace puzzles me, what is: unix_dgram_recvmsg doing there.
Also, it has two invocations of: ext3_file_write
do you have a stacked filesystem of sorts, ext3 on loopback on ext3?


No, no loopback:

# mount
/dev/md0 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec)
devpts on /dev/pts type devpts (rw,nosuid,noexec)
/dev/mapper/VolGrp0-usr on /usr type ext3 (rw,nodev,data=journal)
/dev/mapper/VolGrp0-var on /var type ext3 (rw,nodev,data=journal)
/dev/mapper/VolGrp0-squid_spool on /var/cache/squid/cd0 type ext3 
(rw,nosuid,nodev,noatime,data=writeback)
/dev/mapper/VolGrp0-squid_spool2 on /var/cache/squid/cd1 type ext3 
(rw,nosuid,nodev,noatime,data=writeback)
/dev/mapper/VolGrp0-news_spool on /var/spool/news type ext3 
(rw,nosuid,nodev,noatime)
shm on /dev/shm type tmpfs (rw,noexec,nosuid,nodev)
usbfs on /proc/bus/usb type usbfs (rw,noexec,nosuid,devmode=0664,devgid=85)
owl:/usr/gentoo-nfs on /usr/gentoo-nfs type nfs 
(ro,nosuid,nodev,noatime,bg,intr,tcp,addr=192.168.129.26)

Nothing more.


freshclam D 0282 0  2866  1 (NOTLB)
f36e3cc4 0082 0009 0282 7a0173c0 0002  007b
0009 0001 f7cb8030 f7c72030 82c4884d 0001cfed 09ee f7cb813c
7a016980 f66c0b80 78404217 7812c708  0213 f36e3cd4 1e7a64bb
Call Trace:
  [<78404217>] _spin_unlock_irqrestore+0xf/0x23
  [<7812c708>] __mod_timer+0x92/0x9c
  [<78402b34>] schedule_timeout+0x70/0x8d
  [<7812c521>] process_timeout+0x0/0x5
  [<78402548>] io_schedule_timeout+0x1e/0x28
  [<7814d41e>] congestion_wait+0x50/0x64
  [<78134abc>] autoremove_wake_function+0x0/0x35
  [<781493e7>] balance_dirty_pages_ratelimited_nr+0x16e/0x1dc
  [<78145bd0>] generic_file_buffered_write+0x4ee/0x605
  [<7819cdb4>] __ext3_journal_stop+0x19/0x34
  [<7840408f>] _spin_lock+0xd/0x5a
  [<78176f3d>] __mark_inode_dirty+0xdd/0x16f
  [<78128c8e>] current_fs_time+0x41/0x46
  [<78146167>] __generic_file_aio_write_nolock+0x480/0x4df
  [<7814621b>] generic_file_aio_write+0x55/0xb3
  [<78103159>] setup_sigcontext+0x105/0x189
  [<78194b28>] ext3_file_write+0x24/0x8f
  [<7815f453>] do_sync_write+0xc7/0x10a
  [<78134abc>] autoremove_wake_function+0x0/0x35
  [<781085d2>] convert_fxsr_from_user+0x15/0xd5
  [<7815f38c>] do_sync_write+0x0/0x10a
  [<7815fbb6>] vfs_write+0x8a/0x10c
  [<78160123>] sys_write+0x41/0x67
  [<78103d6a>] sysenter_past_esp+0x5f/0x85
  ===


single write, no networking, also stuck in balance_dirty_pages().


Exactly. Strange, isn't it?

Thanks.

Best regards,

Krzysztof Olędzki

Strange system hangs

2007-09-28 Thread Krzysztof Oledzki

Hello,

I am experiencing weird system hangs. Once about 2-5 weeks system freezes 
and stops accepting remote connections, so it is no longer possible to 
connect to most important services: smtp (postfix), www (squid) or even 
ssh. Such connection is accepted but then it hangs.


What is strange, that previously established ssh session is usable. It is 
possible to work on such system until you do something stupid like "less 
/var/log/all.log". Using strace I found that process blocks on:


--- strace: being ---
execve("/usr/bin/tail", ["tail", "-f", "/var/log/all.log"], [/* 33 vars */]) = 0
brk(0)  = 0x8052000
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x6ff0
access("/etc/ld.so.preload", R_OK)  = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)  = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=20944, ...}) = 0
mmap2(NULL, 20944, PROT_READ, MAP_PRIVATE, 3, 0) = 0x6fefa000
close(3)= 0
open("/lib/libc.so.6", O_RDONLY)= 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0RY\1\0004\0\0\0"..., 
512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=1175920, ...}) = 0
mmap2(NULL, 1185212, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 
0x6fdd8000
mmap2(0x6fef4000, 12288, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x11b) = 0x6fef4000
mmap2(0x6fef7000, 9660, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x6fef7000
close(3)= 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x6fdd7000
set_thread_area({entry_number:-1 -> 6, base_addr:0x6fdd76b0, limit:1048575, 
seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, 
useable:1}) = 0
mprotect(0x6fef4000, 4096, PROT_READ)   = 0
mprotect(0x6ff1c000, 4096, PROT_READ)   = 0
munmap(0x6fefa000, 20944)   = 0
brk(0)  = 0x8052000
brk(0x8073000)  = 0x8073000
open("/var/log/all.log", O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0640, st_size=3171841, ...})
llseek(3, 0,  
--- strace: end ---

This file is not very big:

# ls -l /var/log/all.log
-rw-r- 1 root root 3171841 Sep 27 04:36 /var/log/all.log

Also running "dmesg > file" hangs, creating a file with only 4096 bytes.

--- Show Blocked State: begin ---
SysRq : Show Blocked State

 freesibling
  task PCstack   pid father child younger older
syslogd   D F5C83C60 0  2162  1 (NOTLB)
   f5c83c74 0082 0002 f5c83c60 f5c83c5c   78538d20
   0009 0001 f7f6a070 f7cb8030 82c47e5f 0001cfed 0a43 f7f6a17c
   7a016980 f705dc80 78404217 7812c708  0213 f5c83c84 1e7a64bb
Call Trace:
 [<78404217>] _spin_unlock_irqrestore+0xf/0x23
 [<7812c708>] __mod_timer+0x92/0x9c
 [<78402b34>] schedule_timeout+0x70/0x8d
 [<7812c521>] process_timeout+0x0/0x5
 [<78402548>] io_schedule_timeout+0x1e/0x28
 [<7814d41e>] congestion_wait+0x50/0x64
 [<78134abc>] autoremove_wake_function+0x0/0x35
 [<781493e7>] balance_dirty_pages_ratelimited_nr+0x16e/0x1dc
 [<78145bd0>] generic_file_buffered_write+0x4ee/0x605
 [<783c55a1>] unix_dgram_recvmsg+0x1b4/0x1c8
 [<78128c8e>] current_fs_time+0x41/0x46
 [<78146167>] __generic_file_aio_write_nolock+0x480/0x4df
 [<7814621b>] generic_file_aio_write+0x55/0xb3
 [<78194b28>] ext3_file_write+0x24/0x8f
 [<7815f34f>] do_sync_readv_writev+0xc1/0xfe
 [<78134abc>] autoremove_wake_function+0x0/0x35
 [<784041ae>] _spin_unlock+0xd/0x21
 [<781a8c38>] log_wait_commit+0xc3/0xe3
 [<7814448b>] find_get_pages_tag+0x76/0x80
 [<7815f204>] rw_copy_check_uvector+0x50/0xaa
 [<7815f9d4>] do_readv_writev+0x99/0x164
 [<78194b04>] ext3_file_write+0x0/0x8f
 [<7815fadc>] vfs_writev+0x3d/0x48
 [<7815feb5>] sys_writev+0x41/0x67
 [<78103d6a>] sysenter_past_esp+0x5f/0x85
 ===
freshclam D 0282 0  2866  1 (NOTLB)
   f36e3cc4 0082 0009 0282 7a0173c0 0002  007b
   0009 0001 f7cb8030 f7c72030 82c4884d 0001cfed 09ee f7cb813c
   7a016980 f66c0b80 78404217 7812c708  0213 f36e3cd4 1e7a64bb
Call Trace:
 [<78404217>] _spin_unlock_irqrestore+0xf/0x23
 [<7812c708>] __mod_timer+0x92/0x9c
 [<78402b34>] schedule_timeout+0x70/0x8d
 [<7812c521>] process_timeout+0x0/0x5
 [<78402548>] io_schedule_timeout+0x1e/0x28
 [<7814d41e>] congestion_wait+0x50/0x64
 [<78134abc>] autoremove_wake_function+0x0/0x35
 [<781493e7>] balance_dirty_pages_ratelimited_nr+0x16e/0x1dc
 [<78145bd0>] generic_file_buffered_write+0x4ee/0x605
 [<7819cdb4>] __ext3_journal_stop+0x19/0x34
 [<7840408f>] _spin_lock+0xd/0x5a
 [<78176f3d>] __mark_inode_dirty+0xdd/0x16f
 [<78128c8e>] current_fs_time+0x41/0x46
 [<78146167>] __generic_file_aio_write_nolock+0x480/0x4df
 [<7814621b>] generic_file_aio_write+0x55/0xb3
 [<78103159>] 

Strange system hangs

2007-09-28 Thread Krzysztof Oledzki

Hello,

I am experiencing weird system hangs. Once about 2-5 weeks system freezes 
and stops accepting remote connections, so it is no longer possible to 
connect to most important services: smtp (postfix), www (squid) or even 
ssh. Such connection is accepted but then it hangs.


What is strange, that previously established ssh session is usable. It is 
possible to work on such system until you do something stupid like less 
/var/log/all.log. Using strace I found that process blocks on:


--- strace: being ---
execve(/usr/bin/tail, [tail, -f, /var/log/all.log], [/* 33 vars */]) = 0
brk(0)  = 0x8052000
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x6ff0
access(/etc/ld.so.preload, R_OK)  = -1 ENOENT (No such file or directory)
open(/etc/ld.so.cache, O_RDONLY)  = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=20944, ...}) = 0
mmap2(NULL, 20944, PROT_READ, MAP_PRIVATE, 3, 0) = 0x6fefa000
close(3)= 0
open(/lib/libc.so.6, O_RDONLY)= 3
read(3, \177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0RY\1\0004\0\0\0..., 
512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=1175920, ...}) = 0
mmap2(NULL, 1185212, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 
0x6fdd8000
mmap2(0x6fef4000, 12288, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x11b) = 0x6fef4000
mmap2(0x6fef7000, 9660, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x6fef7000
close(3)= 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x6fdd7000
set_thread_area({entry_number:-1 - 6, base_addr:0x6fdd76b0, limit:1048575, 
seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, 
useable:1}) = 0
mprotect(0x6fef4000, 4096, PROT_READ)   = 0
mprotect(0x6ff1c000, 4096, PROT_READ)   = 0
munmap(0x6fefa000, 20944)   = 0
brk(0)  = 0x8052000
brk(0x8073000)  = 0x8073000
open(/var/log/all.log, O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0640, st_size=3171841, ...})
llseek(3, 0,  unfinished ...
--- strace: end ---

This file is not very big:

# ls -l /var/log/all.log
-rw-r- 1 root root 3171841 Sep 27 04:36 /var/log/all.log

Also running dmesg  file hangs, creating a file with only 4096 bytes.

--- Show Blocked State: begin ---
SysRq : Show Blocked State

 freesibling
  task PCstack   pid father child younger older
syslogd   D F5C83C60 0  2162  1 (NOTLB)
   f5c83c74 0082 0002 f5c83c60 f5c83c5c   78538d20
   0009 0001 f7f6a070 f7cb8030 82c47e5f 0001cfed 0a43 f7f6a17c
   7a016980 f705dc80 78404217 7812c708  0213 f5c83c84 1e7a64bb
Call Trace:
 [78404217] _spin_unlock_irqrestore+0xf/0x23
 [7812c708] __mod_timer+0x92/0x9c
 [78402b34] schedule_timeout+0x70/0x8d
 [7812c521] process_timeout+0x0/0x5
 [78402548] io_schedule_timeout+0x1e/0x28
 [7814d41e] congestion_wait+0x50/0x64
 [78134abc] autoremove_wake_function+0x0/0x35
 [781493e7] balance_dirty_pages_ratelimited_nr+0x16e/0x1dc
 [78145bd0] generic_file_buffered_write+0x4ee/0x605
 [783c55a1] unix_dgram_recvmsg+0x1b4/0x1c8
 [78128c8e] current_fs_time+0x41/0x46
 [78146167] __generic_file_aio_write_nolock+0x480/0x4df
 [7814621b] generic_file_aio_write+0x55/0xb3
 [78194b28] ext3_file_write+0x24/0x8f
 [7815f34f] do_sync_readv_writev+0xc1/0xfe
 [78134abc] autoremove_wake_function+0x0/0x35
 [784041ae] _spin_unlock+0xd/0x21
 [781a8c38] log_wait_commit+0xc3/0xe3
 [7814448b] find_get_pages_tag+0x76/0x80
 [7815f204] rw_copy_check_uvector+0x50/0xaa
 [7815f9d4] do_readv_writev+0x99/0x164
 [78194b04] ext3_file_write+0x0/0x8f
 [7815fadc] vfs_writev+0x3d/0x48
 [7815feb5] sys_writev+0x41/0x67
 [78103d6a] sysenter_past_esp+0x5f/0x85
 ===
freshclam D 0282 0  2866  1 (NOTLB)
   f36e3cc4 0082 0009 0282 7a0173c0 0002  007b
   0009 0001 f7cb8030 f7c72030 82c4884d 0001cfed 09ee f7cb813c
   7a016980 f66c0b80 78404217 7812c708  0213 f36e3cd4 1e7a64bb
Call Trace:
 [78404217] _spin_unlock_irqrestore+0xf/0x23
 [7812c708] __mod_timer+0x92/0x9c
 [78402b34] schedule_timeout+0x70/0x8d
 [7812c521] process_timeout+0x0/0x5
 [78402548] io_schedule_timeout+0x1e/0x28
 [7814d41e] congestion_wait+0x50/0x64
 [78134abc] autoremove_wake_function+0x0/0x35
 [781493e7] balance_dirty_pages_ratelimited_nr+0x16e/0x1dc
 [78145bd0] generic_file_buffered_write+0x4ee/0x605
 [7819cdb4] __ext3_journal_stop+0x19/0x34
 [7840408f] _spin_lock+0xd/0x5a
 [78176f3d] __mark_inode_dirty+0xdd/0x16f
 [78128c8e] current_fs_time+0x41/0x46
 [78146167] __generic_file_aio_write_nolock+0x480/0x4df
 [7814621b] generic_file_aio_write+0x55/0xb3
 [78103159] setup_sigcontext+0x105/0x189
 [78194b28] ext3_file_write+0x24/0x8f
 [7815f453] 

Re: Strange system hangs

2007-09-28 Thread Krzysztof Oledzki



On Fri, 28 Sep 2007, Peter Zijlstra wrote:


On Fri, 2007-09-28 at 10:42 +0200, Krzysztof Oledzki wrote:

Hello,

I am experiencing weird system hangs. Once about 2-5 weeks system freezes
and stops accepting remote connections, so it is no longer possible to
connect to most important services: smtp (postfix), www (squid) or even
ssh. Such connection is accepted but then it hangs.

What is strange, that previously established ssh session is usable. It is
possible to work on such system until you do something stupid like less
/var/log/all.log.


So it takes weeks to reproduce this?


Unfortunately, yes. :(


  freesibling
   task PCstack   pid father child younger older
syslogd   D F5C83C60 0  2162  1 (NOTLB)
f5c83c74 0082 0002 f5c83c60 f5c83c5c   78538d20
0009 0001 f7f6a070 f7cb8030 82c47e5f 0001cfed 0a43 f7f6a17c
7a016980 f705dc80 78404217 7812c708  0213 f5c83c84 1e7a64bb
Call Trace:
  [78404217] _spin_unlock_irqrestore+0xf/0x23
  [7812c708] __mod_timer+0x92/0x9c
  [78402b34] schedule_timeout+0x70/0x8d
  [7812c521] process_timeout+0x0/0x5
  [78402548] io_schedule_timeout+0x1e/0x28
  [7814d41e] congestion_wait+0x50/0x64
  [78134abc] autoremove_wake_function+0x0/0x35
  [781493e7] balance_dirty_pages_ratelimited_nr+0x16e/0x1dc
  [78145bd0] generic_file_buffered_write+0x4ee/0x605
  [783c55a1] unix_dgram_recvmsg+0x1b4/0x1c8
  [78128c8e] current_fs_time+0x41/0x46
  [78146167] __generic_file_aio_write_nolock+0x480/0x4df
  [7814621b] generic_file_aio_write+0x55/0xb3
  [78194b28] ext3_file_write+0x24/0x8f
  [7815f34f] do_sync_readv_writev+0xc1/0xfe
  [78134abc] autoremove_wake_function+0x0/0x35
  [784041ae] _spin_unlock+0xd/0x21
  [781a8c38] log_wait_commit+0xc3/0xe3
  [7814448b] find_get_pages_tag+0x76/0x80
  [7815f204] rw_copy_check_uvector+0x50/0xaa
  [7815f9d4] do_readv_writev+0x99/0x164
  [78194b04] ext3_file_write+0x0/0x8f
  [7815fadc] vfs_writev+0x3d/0x48
  [7815feb5] sys_writev+0x41/0x67
  [78103d6a] sysenter_past_esp+0x5f/0x85
  ===


This trace puzzles me, what is: unix_dgram_recvmsg doing there.
Also, it has two invocations of: ext3_file_write
do you have a stacked filesystem of sorts, ext3 on loopback on ext3?


No, no loopback:

# mount
/dev/md0 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec)
devpts on /dev/pts type devpts (rw,nosuid,noexec)
/dev/mapper/VolGrp0-usr on /usr type ext3 (rw,nodev,data=journal)
/dev/mapper/VolGrp0-var on /var type ext3 (rw,nodev,data=journal)
/dev/mapper/VolGrp0-squid_spool on /var/cache/squid/cd0 type ext3 
(rw,nosuid,nodev,noatime,data=writeback)
/dev/mapper/VolGrp0-squid_spool2 on /var/cache/squid/cd1 type ext3 
(rw,nosuid,nodev,noatime,data=writeback)
/dev/mapper/VolGrp0-news_spool on /var/spool/news type ext3 
(rw,nosuid,nodev,noatime)
shm on /dev/shm type tmpfs (rw,noexec,nosuid,nodev)
usbfs on /proc/bus/usb type usbfs (rw,noexec,nosuid,devmode=0664,devgid=85)
owl:/usr/gentoo-nfs on /usr/gentoo-nfs type nfs 
(ro,nosuid,nodev,noatime,bg,intr,tcp,addr=192.168.129.26)

Nothing more.


freshclam D 0282 0  2866  1 (NOTLB)
f36e3cc4 0082 0009 0282 7a0173c0 0002  007b
0009 0001 f7cb8030 f7c72030 82c4884d 0001cfed 09ee f7cb813c
7a016980 f66c0b80 78404217 7812c708  0213 f36e3cd4 1e7a64bb
Call Trace:
  [78404217] _spin_unlock_irqrestore+0xf/0x23
  [7812c708] __mod_timer+0x92/0x9c
  [78402b34] schedule_timeout+0x70/0x8d
  [7812c521] process_timeout+0x0/0x5
  [78402548] io_schedule_timeout+0x1e/0x28
  [7814d41e] congestion_wait+0x50/0x64
  [78134abc] autoremove_wake_function+0x0/0x35
  [781493e7] balance_dirty_pages_ratelimited_nr+0x16e/0x1dc
  [78145bd0] generic_file_buffered_write+0x4ee/0x605
  [7819cdb4] __ext3_journal_stop+0x19/0x34
  [7840408f] _spin_lock+0xd/0x5a
  [78176f3d] __mark_inode_dirty+0xdd/0x16f
  [78128c8e] current_fs_time+0x41/0x46
  [78146167] __generic_file_aio_write_nolock+0x480/0x4df
  [7814621b] generic_file_aio_write+0x55/0xb3
  [78103159] setup_sigcontext+0x105/0x189
  [78194b28] ext3_file_write+0x24/0x8f
  [7815f453] do_sync_write+0xc7/0x10a
  [78134abc] autoremove_wake_function+0x0/0x35
  [781085d2] convert_fxsr_from_user+0x15/0xd5
  [7815f38c] do_sync_write+0x0/0x10a
  [7815fbb6] vfs_write+0x8a/0x10c
  [78160123] sys_write+0x41/0x67
  [78103d6a] sysenter_past_esp+0x5f/0x85
  ===


single write, no networking, also stuck in balance_dirty_pages().


Exactly. Strange, isn't it?

Thanks.

Best regards,

Krzysztof Olędzki

Re: [PATCH 1/2] bnx2: factor out gzip unpacker

2007-09-21 Thread Krzysztof Oledzki



On Fri, 21 Sep 2007, Denys Vlasenko wrote:


On Friday 21 September 2007 19:36, [EMAIL PROTECTED] wrote:

On Fri, 21 Sep 2007 19:05:23 BST, Denys Vlasenko said:


I plan to use gzip compression on following drivers' firmware,
if patches will be accepted:

   textdata bss dec hex filename
  17653  109968 240  127861   1f375 drivers/net/acenic.o
   6628  120448   4  127080   1f068 drivers/net/dgrs.o
 ^^


Should this be redone to use the existing firmware loading framework to
load the firmware instead?


Not in every case.

For example, bnx2 maintainer says that driver and
firmware are closely tied for his driver. IOW: you upgrade kernel
and your NIC is not working anymore.
Firmware may come with a kernel. We have a "install modules", we can also 
add "install firmware".



Another argument is to make kernel be able to bring up NICs
without needing firmware images in initramfs/initrd/hard drive.


It is not possible to bring up things like FC or WiFi without firmware, 
what special is in classic NICs?


Best regards,

Krzysztof Olędzki

Re: [PATCH 1/2] bnx2: factor out gzip unpacker

2007-09-21 Thread Krzysztof Oledzki



On Fri, 21 Sep 2007, Denys Vlasenko wrote:


On Friday 21 September 2007 19:36, [EMAIL PROTECTED] wrote:

On Fri, 21 Sep 2007 19:05:23 BST, Denys Vlasenko said:


I plan to use gzip compression on following drivers' firmware,
if patches will be accepted:

   textdata bss dec hex filename
  17653  109968 240  127861   1f375 drivers/net/acenic.o
   6628  120448   4  127080   1f068 drivers/net/dgrs.o
 ^^


Should this be redone to use the existing firmware loading framework to
load the firmware instead?


Not in every case.

For example, bnx2 maintainer says that driver and
firmware are closely tied for his driver. IOW: you upgrade kernel
and your NIC is not working anymore.
Firmware may come with a kernel. We have a install modules, we can also 
add install firmware.



Another argument is to make kernel be able to bring up NICs
without needing firmware images in initramfs/initrd/hard drive.


It is not possible to bring up things like FC or WiFi without firmware, 
what special is in classic NICs?


Best regards,

Krzysztof Olędzki

Re: mss to pmtu clamping partially broken?

2007-07-02 Thread Krzysztof Oledzki



On Mon, 2 Jul 2007, Phil Dibowitz wrote:


On Mon, Jul 02, 2007 at 09:16:57PM +0200, Krzysztof Oledzki wrote:



On Mon, 2 Jul 2007, Phil Dibowitz wrote:


On Mon, Jul 02, 2007 at 07:04:12PM +0200, Andreas Steinmetz wrote:

Jan Engelhardt wrote:

Do you really need clamping? It's a hack, since TCP should do MSS
negotiation
itself. (Of course it may happen that some routers are broken.) But
usually not
for incoming packets.


You never know when you hit ICMP blackholes, broken routers and other
evil things. Better safe than sorry so clamping is the way to go for me.


I encourage you to report PMTUD Blackholes to the MSS Initiative at
http://www.phildev.net/mss/


Any chances for similar initiative for "SACK vandals"? ;)


There's already a counterpart for ECN blackholes, so I'm not opposed to it.
However, keeping up with new reports, re-testing existing offenders, etc.
takes up a good chunk of time, so I don't have the time to do it myself. I'm
happy to reference such a site, however.


Indeed and it seems there are more important issues, like similar window 
scaling problem for example. :(


Best regards,

Krzysztof Olędzki

Re: mss to pmtu clamping partially broken?

2007-07-02 Thread Krzysztof Oledzki



On Mon, 2 Jul 2007, Phil Dibowitz wrote:


On Mon, Jul 02, 2007 at 07:04:12PM +0200, Andreas Steinmetz wrote:

Jan Engelhardt wrote:

Do you really need clamping? It's a hack, since TCP should do MSS negotiation
itself. (Of course it may happen that some routers are broken.) But usually not
for incoming packets.


You never know when you hit ICMP blackholes, broken routers and other
evil things. Better safe than sorry so clamping is the way to go for me.


I encourage you to report PMTUD Blackholes to the MSS Initiative at
http://www.phildev.net/mss/


Any chances for similar initiative for "SACK vandals"? ;)

Best regards,

Krzysztof Olędzki

Re: mss to pmtu clamping partially broken?

2007-07-02 Thread Krzysztof Oledzki



On Mon, 2 Jul 2007, Phil Dibowitz wrote:


On Mon, Jul 02, 2007 at 07:04:12PM +0200, Andreas Steinmetz wrote:

Jan Engelhardt wrote:

Do you really need clamping? It's a hack, since TCP should do MSS negotiation
itself. (Of course it may happen that some routers are broken.) But usually not
for incoming packets.


You never know when you hit ICMP blackholes, broken routers and other
evil things. Better safe than sorry so clamping is the way to go for me.


I encourage you to report PMTUD Blackholes to the MSS Initiative at
http://www.phildev.net/mss/


Any chances for similar initiative for SACK vandals? ;)

Best regards,

Krzysztof Olędzki

Re: mss to pmtu clamping partially broken?

2007-07-02 Thread Krzysztof Oledzki



On Mon, 2 Jul 2007, Phil Dibowitz wrote:


On Mon, Jul 02, 2007 at 09:16:57PM +0200, Krzysztof Oledzki wrote:



On Mon, 2 Jul 2007, Phil Dibowitz wrote:


On Mon, Jul 02, 2007 at 07:04:12PM +0200, Andreas Steinmetz wrote:

Jan Engelhardt wrote:

Do you really need clamping? It's a hack, since TCP should do MSS
negotiation
itself. (Of course it may happen that some routers are broken.) But
usually not
for incoming packets.


You never know when you hit ICMP blackholes, broken routers and other
evil things. Better safe than sorry so clamping is the way to go for me.


I encourage you to report PMTUD Blackholes to the MSS Initiative at
http://www.phildev.net/mss/


Any chances for similar initiative for SACK vandals? ;)


There's already a counterpart for ECN blackholes, so I'm not opposed to it.
However, keeping up with new reports, re-testing existing offenders, etc.
takes up a good chunk of time, so I don't have the time to do it myself. I'm
happy to reference such a site, however.


Indeed and it seems there are more important issues, like similar window 
scaling problem for example. :(


Best regards,

Krzysztof Olędzki

Re: IRQ handling difference between i386 and x86_64

2007-06-30 Thread Krzysztof Oledzki



On Sat, 30 Jun 2007, Arjan van de Ven wrote:


On Sat, 2007-06-30 at 16:55 +0200, Krzysztof Oledzki wrote:

Hello,

It seems that IRQ handling is somehow different between i386 and x86_64.

In my Dell PowerEdge 1950 is it possible to enable interrupts spreading
over all CPUs. This a single CPU, four CORE system (Quad-Core E5335 Xeon)
so I think that interrupts migration may be useful. Unfortunately, it
works only with 32-bit kernel. Booting it with x86_64 leads to situation,
when all interrupts goes only to the first cpu matching a smp_affinity
mask.


arguably that is the most efficient behavior... round robin of
interrupts is the worst possible case in terms of performance


Even on dual/quadro core CPUs with shared cache? So why it is possible to 
enable such behaviuor in BIOS, which works only on i386 BTW. :(



are you using irqbalance ? (www.irqbalance.org)


Yes, I'm aware about this useful tool, but in some situations (routing 
for example) it cannot help much as it keeps three cpus idle. :(


Best regards,

Krzysztof Olędzki

Re: IRQ handling difference between i386 and x86_64

2007-06-30 Thread Krzysztof Oledzki



On Sat, 30 Jun 2007, Arjan van de Ven wrote:


On Sat, 2007-06-30 at 16:55 +0200, Krzysztof Oledzki wrote:

Hello,

It seems that IRQ handling is somehow different between i386 and x86_64.

In my Dell PowerEdge 1950 is it possible to enable interrupts spreading
over all CPUs. This a single CPU, four CORE system (Quad-Core E5335 Xeon)
so I think that interrupts migration may be useful. Unfortunately, it
works only with 32-bit kernel. Booting it with x86_64 leads to situation,
when all interrupts goes only to the first cpu matching a smp_affinity
mask.


arguably that is the most efficient behavior... round robin of
interrupts is the worst possible case in terms of performance


Even on dual/quadro core CPUs with shared cache? So why it is possible to 
enable such behaviuor in BIOS, which works only on i386 BTW. :(



are you using irqbalance ? (www.irqbalance.org)


Yes, I'm aware about this useful tool, but in some situations (routing 
for example) it cannot help much as it keeps three cpus idle. :(


Best regards,

Krzysztof Olędzki

Re: pata_via in 2.6.19-rc6: UDMA/66 hdd downgraded to UDMA/33

2006-12-09 Thread Krzysztof Oledzki



On Wed, 29 Nov 2006, Alan wrote:


Does this fix it

--- drivers/ata/pata_via.c~ 2006-11-29 15:16:10.961387472 +
+++ drivers/ata/pata_via.c  2006-11-29 15:17:08.784597008 +
@@ -60,7 +60,7 @@
#include 

#define DRV_NAME "pata_via"
-#define DRV_VERSION "0.2.0"
+#define DRV_VERSION "0.2.1"

/*
 *  The following comes directly from Vojtech Pavlik's ide/pci/via82cxxx
@@ -159,10 +159,13 @@
return -ENOENT;
}

-   if ((config->flags & VIA_UDMA) >= VIA_UDMA_66)
+   if ((config->flags & VIA_UDMA) >= VIA_UDMA_100)
ap->cbl = via_cable_detect(ap);
-   else
+   /* The UDMA66 series has no cable detect so do drive side detect */
+   else if ((config->flags & VIA_UDMA) < VIA_UDMA_66)
ap->cbl = ATA_CBL_PATA40;
+
+
return ata_std_prereset(ap);
}


Yes and no - UDMA66 gets enabled, but with both 80-wire and 40-wire 
cables.


* With 80-wire cable:
pata_via :00:07.1: version 0.1.14
ata1: PATA max UDMA/66 cmd 0x1F0 ctl 0x3F6 bmdma 0xFFA0 irq 14
ata2: PATA max UDMA/66 cmd 0x170 ctl 0x376 bmdma 0xFFA8 irq 15
scsi0 : pata_via
ata1.00: ATA-5, max UDMA/66, 40031712 sectors: LBA
ata1.00: ata1: dev 0 multi count 16
ata1.00: configured for UDMA/66
scsi1 : pata_via
ata2: port is slow to respond, please be patient (Status 0xff)
ata2: port failed to respond (30 secs, Status 0xff)
ata2: SRST failed (status 0xFF)
ata2: SRST failed (err_mask=0x100)
ata2: softreset failed, retrying in 5 secs
ata2: SRST failed (status 0xFF)
ata2: SRST failed (err_mask=0x100)
ata2: softreset failed, retrying in 5 secs
ata2: SRST failed (status 0xFF)
ata2: SRST failed (err_mask=0x100)
ata2: reset failed, giving up
scsi 0:0:0:0: Direct-Access ATA  FUJITSU MPF3204A 0031 PQ: 0 ANSI: 5
SCSI device sda: 40031712 512-byte hdwr sectors (20496 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
SCSI device sda: 40031712 512-byte hdwr sectors (20496 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
 sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 sda8 sda9 sda10 sda11 >
sd 0:0:0:0: Attached scsi disk sda
(...)

* With 40-wire cable:
pata_via :00:07.1: version 0.1.14
ata1: PATA max UDMA/66 cmd 0x1F0 ctl 0x3F6 bmdma 0xFFA0 irq 14
ata2: PATA max UDMA/66 cmd 0x170 ctl 0x376 bmdma 0xFFA8 irq 15
scsi0 : pata_via
ata1.00: ATA-5, max UDMA/66, 40031712 sectors: LBA
ata1.00: ata1: dev 0 multi count 16
ata1.00: configured for UDMA/66
scsi1 : pata_via
ata2: port is slow to respond, please be patient (Status 0xff)
ata2: port failed to respond (30 secs, Status 0xff)
ata2: SRST failed (status 0xFF)
ata2: SRST failed (err_mask=0x100)
ata2: softreset failed, retrying in 5 secs
ata2: SRST failed (status 0xFF)
ata2: SRST failed (err_mask=0x100)
ata2: softreset failed, retrying in 5 secs
ata2: SRST failed (status 0xFF)
ata2: SRST failed (err_mask=0x100)
ata2: reset failed, giving up
scsi 0:0:0:0: Direct-Access ATA  FUJITSU MPF3204A 0031 PQ: 0 ANSI: 5
SCSI device sda: 40031712 512-byte hdwr sectors (20496 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
SCSI device sda: 40031712 512-byte hdwr sectors (20496 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
 sda:<3>ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
ata1.00: (BMDMA stat 0x20)
ata1.00: tag 0 cmd 0xc8 Emask 0x10 stat 0x51 err 0x84 (ATA bus error)
ata1: soft resetting port
ata1.00: configured for UDMA/66
ata1: EH complete
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
ata1.00: (BMDMA stat 0x20)
ata1.00: tag 0 cmd 0xc8 Emask 0x10 stat 0x51 err 0x84 (ATA bus error)
ata1: soft resetting port
ata1.00: configured for UDMA/66
ata1: EH complete
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
ata1.00: (BMDMA stat 0x20)
ata1.00: tag 0 cmd 0xc8 Emask 0x10 stat 0x51 err 0x84 (ATA bus error)
ata1: soft resetting port
ata1.00: configured for UDMA/66
ata1: EH complete
ata1.00: limiting speed to UDMA/44
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
ata1.00: (BMDMA stat 0x20)
ata1.00: tag 0 cmd 0xc8 Emask 0x10 stat 0x51 err 0x84 (ATA bus error)
ata1: soft resetting port
ata1.00: configured for UDMA/44
ata1: EH complete
 sda1 sda2 sda3 sda4 < sda5 sda6 sda7 sda8 sda9 sda10 sda11 >
sd 0:0:0:0: Attached scsi disk sda



Besr regards,

Krzysztof Olędzki

Re: pata_via in 2.6.19-rc6: UDMA/66 hdd downgraded to UDMA/33

2006-12-09 Thread Krzysztof Oledzki



On Wed, 29 Nov 2006, Alan wrote:


Does this fix it

--- drivers/ata/pata_via.c~ 2006-11-29 15:16:10.961387472 +
+++ drivers/ata/pata_via.c  2006-11-29 15:17:08.784597008 +
@@ -60,7 +60,7 @@
#include linux/libata.h

#define DRV_NAME pata_via
-#define DRV_VERSION 0.2.0
+#define DRV_VERSION 0.2.1

/*
 *  The following comes directly from Vojtech Pavlik's ide/pci/via82cxxx
@@ -159,10 +159,13 @@
return -ENOENT;
}

-   if ((config-flags  VIA_UDMA) = VIA_UDMA_66)
+   if ((config-flags  VIA_UDMA) = VIA_UDMA_100)
ap-cbl = via_cable_detect(ap);
-   else
+   /* The UDMA66 series has no cable detect so do drive side detect */
+   else if ((config-flags  VIA_UDMA)  VIA_UDMA_66)
ap-cbl = ATA_CBL_PATA40;
+
+
return ata_std_prereset(ap);
}


Yes and no - UDMA66 gets enabled, but with both 80-wire and 40-wire 
cables.


* With 80-wire cable:
pata_via :00:07.1: version 0.1.14
ata1: PATA max UDMA/66 cmd 0x1F0 ctl 0x3F6 bmdma 0xFFA0 irq 14
ata2: PATA max UDMA/66 cmd 0x170 ctl 0x376 bmdma 0xFFA8 irq 15
scsi0 : pata_via
ata1.00: ATA-5, max UDMA/66, 40031712 sectors: LBA
ata1.00: ata1: dev 0 multi count 16
ata1.00: configured for UDMA/66
scsi1 : pata_via
ata2: port is slow to respond, please be patient (Status 0xff)
ata2: port failed to respond (30 secs, Status 0xff)
ata2: SRST failed (status 0xFF)
ata2: SRST failed (err_mask=0x100)
ata2: softreset failed, retrying in 5 secs
ata2: SRST failed (status 0xFF)
ata2: SRST failed (err_mask=0x100)
ata2: softreset failed, retrying in 5 secs
ata2: SRST failed (status 0xFF)
ata2: SRST failed (err_mask=0x100)
ata2: reset failed, giving up
scsi 0:0:0:0: Direct-Access ATA  FUJITSU MPF3204A 0031 PQ: 0 ANSI: 5
SCSI device sda: 40031712 512-byte hdwr sectors (20496 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
SCSI device sda: 40031712 512-byte hdwr sectors (20496 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
 sda: sda1 sda2 sda3 sda4  sda5 sda6 sda7 sda8 sda9 sda10 sda11 
sd 0:0:0:0: Attached scsi disk sda
(...)

* With 40-wire cable:
pata_via :00:07.1: version 0.1.14
ata1: PATA max UDMA/66 cmd 0x1F0 ctl 0x3F6 bmdma 0xFFA0 irq 14
ata2: PATA max UDMA/66 cmd 0x170 ctl 0x376 bmdma 0xFFA8 irq 15
scsi0 : pata_via
ata1.00: ATA-5, max UDMA/66, 40031712 sectors: LBA
ata1.00: ata1: dev 0 multi count 16
ata1.00: configured for UDMA/66
scsi1 : pata_via
ata2: port is slow to respond, please be patient (Status 0xff)
ata2: port failed to respond (30 secs, Status 0xff)
ata2: SRST failed (status 0xFF)
ata2: SRST failed (err_mask=0x100)
ata2: softreset failed, retrying in 5 secs
ata2: SRST failed (status 0xFF)
ata2: SRST failed (err_mask=0x100)
ata2: softreset failed, retrying in 5 secs
ata2: SRST failed (status 0xFF)
ata2: SRST failed (err_mask=0x100)
ata2: reset failed, giving up
scsi 0:0:0:0: Direct-Access ATA  FUJITSU MPF3204A 0031 PQ: 0 ANSI: 5
SCSI device sda: 40031712 512-byte hdwr sectors (20496 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
SCSI device sda: 40031712 512-byte hdwr sectors (20496 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
 sda:3ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
ata1.00: (BMDMA stat 0x20)
ata1.00: tag 0 cmd 0xc8 Emask 0x10 stat 0x51 err 0x84 (ATA bus error)
ata1: soft resetting port
ata1.00: configured for UDMA/66
ata1: EH complete
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
ata1.00: (BMDMA stat 0x20)
ata1.00: tag 0 cmd 0xc8 Emask 0x10 stat 0x51 err 0x84 (ATA bus error)
ata1: soft resetting port
ata1.00: configured for UDMA/66
ata1: EH complete
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
ata1.00: (BMDMA stat 0x20)
ata1.00: tag 0 cmd 0xc8 Emask 0x10 stat 0x51 err 0x84 (ATA bus error)
ata1: soft resetting port
ata1.00: configured for UDMA/66
ata1: EH complete
ata1.00: limiting speed to UDMA/44
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
ata1.00: (BMDMA stat 0x20)
ata1.00: tag 0 cmd 0xc8 Emask 0x10 stat 0x51 err 0x84 (ATA bus error)
ata1: soft resetting port
ata1.00: configured for UDMA/44
ata1: EH complete
 sda1 sda2 sda3 sda4  sda5 sda6 sda7 sda8 sda9 sda10 sda11 
sd 0:0:0:0: Attached scsi disk sda



Besr regards,

Krzysztof Olędzki

Re: Fw: [Bugme-new] [Bug 5194] New: IPSec related OOps in 2.6.13

2005-09-06 Thread Krzysztof Oledzki



On Tue, 6 Sep 2005, Herbert Xu wrote:


On Tue, Sep 06, 2005 at 04:08:56AM -0700, Andrew Morton wrote:


Problem Description:

Oops:  [#1]
PREEMPT
Modules linked in:
CPU:0
EIP:0060:[]Not tainted VLI
EFLAGS: 00010216   (2.6.13)
EIP is at sha1_update+0x7c/0x160


Thanks for the report.  Matt LaPlante had exactly the same problem
a couple of days ago.  I've tracked down now to my broken crypto
cipher wrapper functions which will step over a page boundary if
it's not aligned correctly.


[CRYPTO] Fix boundary check in standard multi-block cipher processors


Thanks. Patched my kernel, recompiled and waiting. So far it is OK,

Should this patch be merged into 2.6.13.1?

Best regards,

Krzysztof Olędzki


Re: Fw: [Bugme-new] [Bug 5194] New: IPSec related OOps in 2.6.13

2005-09-06 Thread Krzysztof Oledzki



On Tue, 6 Sep 2005, Herbert Xu wrote:


On Tue, Sep 06, 2005 at 04:08:56AM -0700, Andrew Morton wrote:


Problem Description:

Oops:  [#1]
PREEMPT
Modules linked in:
CPU:0
EIP:0060:[c01f562c]Not tainted VLI
EFLAGS: 00010216   (2.6.13)
EIP is at sha1_update+0x7c/0x160


Thanks for the report.  Matt LaPlante had exactly the same problem
a couple of days ago.  I've tracked down now to my broken crypto
cipher wrapper functions which will step over a page boundary if
it's not aligned correctly.


[CRYPTO] Fix boundary check in standard multi-block cipher processors


Thanks. Patched my kernel, recompiled and waiting. So far it is OK,

Should this patch be merged into 2.6.13.1?

Best regards,

Krzysztof Olędzki