Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-09-30 Thread Ingo Molnar
* Mike Travis [EMAIL PROTECTED] wrote: could you please send whatever .c changes you have already, so that we can have a look at how the end result will look like? Doesnt have to build, i'm just curious about how it looks like in practice, semantically. I will, and the full

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-09-30 Thread Mike Travis
Ingo Molnar wrote: * Mike Travis [EMAIL PROTECTED] wrote: could you please send whatever .c changes you have already, so that we can have a look at how the end result will look like? Doesnt have to build, i'm just curious about how it looks like in practice, semantically. I will, and

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-09-30 Thread Linus Torvalds
On Tue, 30 Sep 2008, Mike Travis wrote: One pain is: typedef struct __cpumask_s *cpumask_t; const cpumask_t xxx; is not the same as: typedef const struct __cpumask_s *const_cpumask_t; const_cpumask_t xxx; and I'm not exactly sure why. Umm. The const has

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-09-30 Thread Mike Travis
Linus Torvalds wrote: On Tue, 30 Sep 2008, Mike Travis wrote: One pain is: typedef struct __cpumask_s *cpumask_t; const cpumask_t xxx; is not the same as: typedef const struct __cpumask_s *const_cpumask_t; const_cpumask_t xxx; and I'm not exactly sure why. Umm.

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-09-30 Thread Rusty Russell
On Wednesday 01 October 2008 02:46:59 Linus Torvalds wrote: Quite frankly, I personally do hate typedefs that end up being pointers, and used as pointers, without showing that in the source code. ... I'm now a bit more leery about this whole thing just because the typedef ends up hiding so

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-09-29 Thread Mike Travis
Ingo Molnar wrote: * Mike Travis [EMAIL PROTECTED] wrote: Hi Rusty, I've gotten some good traction on the changes in the following patch. About 30% of the kernel is compiling right now and I'm picking up errors and warnings as I'm going along. I think it's doing most of what we need.

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-09-27 Thread Ingo Molnar
* Mike Travis [EMAIL PROTECTED] wrote: Hi Rusty, I've gotten some good traction on the changes in the following patch. About 30% of the kernel is compiling right now and I'm picking up errors and warnings as I'm going along. I think it's doing most of what we need. Attempting to

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-09-25 Thread Mike Travis
Rusty Russell wrote: On Friday 26 September 2008 01:42:13 Linus Torvalds wrote: On Thu, 25 Sep 2008, Rusty Russell wrote: This turns out to be awful in practice, mainly due to const. Consider: #ifdef CONFIG_CPUMASK_OFFSTACK typedef unsigned long *cpumask_t; #else

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-09-24 Thread Rusty Russell
On Wednesday 27 August 2008 02:51:46 Linus Torvalds wrote: On Tue, 26 Aug 2008, Yinghai Lu wrote: wonder if could use unsigned long * directly. I would actually suggest something like this: - we continue to have a magic cpumask_t. - we do different cases for big and small NR_CPUS:

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-09-01 Thread Jes Sorensen
Linus Torvalds wrote: On Fri, 29 Aug 2008, Jes Sorensen wrote: I have only tested this on ia64, but it boots, so it's obviously perfecttm :-) Well, it probably boots because it doesn't really seem to _change_ much of anything. Hi Linus, I realize that, but as I have been doing this work

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-29 Thread Jes Sorensen
David Miller wrote: Otherwise you have to modify cpumask_t objects and thus pluck them onto the stack where they take up silly amounts of space. Yes, I had proposed either modifying, or supplementing a new smp_call function to pass the cpumask_t as a pointer (similar to set_cpus_allowed_ptr.)

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-29 Thread Linus Torvalds
On Fri, 29 Aug 2008, Jes Sorensen wrote: I have only tested this on ia64, but it boots, so it's obviously perfecttm :-) Well, it probably boots because it doesn't really seem to _change_ much of anything. Things like this: -static inline void arch_send_call_function_ipi(cpumask_t

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-29 Thread David Miller
From: Linus Torvalds [EMAIL PROTECTED] Date: Fri, 29 Aug 2008 09:14:44 -0700 (PDT) Well, it probably boots because it doesn't really seem to _change_ much of anything. Things like this: -static inline void arch_send_call_function_ipi(cpumask_t mask) +static inline void

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-28 Thread Adrian Bunk
On Thu, Aug 28, 2008 at 09:32:13AM +0900, Paul Mundt wrote: On Wed, Aug 27, 2008 at 08:35:44PM +0300, Adrian Bunk wrote: On Thu, Aug 28, 2008 at 01:00:52AM +0900, Paul Mundt wrote: On Wed, Aug 27, 2008 at 02:58:30PM +0300, Adrian Bunk wrote: In addition to that, debugging the runaway

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-27 Thread Paul Mackerras
Linus Torvalds writes: 4kB used to be the _only_ choice. And no, there weren't even irq stacks. So that 4kB was not just the whole kernel call-chain, it was also all the irq nesting above it. I think your memory is failing you. In 2.4 and earlier, the kernel stack was 8kB minus the size of

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-27 Thread Nick Piggin
On Wednesday 27 August 2008 06:01, Mike Travis wrote: Dave Jones wrote: ... But yes, for this to be even remotely feasible, there has to be a negligable performance cost associated with it, which right now, we clearly don't have. Given that the number of people running 4096 CPU boxes

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-27 Thread Nick Piggin
On Wednesday 27 August 2008 17:05, David Miller wrote: From: Nick Piggin [EMAIL PROTECTED] Date: Wed, 27 Aug 2008 16:54:32 +1000 5% is a pretty nasty performance hit... what sort of benchmarks are we talking about here? I just made some pretty crazy changes to the VM to get only around

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-27 Thread Bernd Petrovitsch
On Tue, 2008-08-26 at 18:54 -0400, Parag Warudkar wrote: On Tue, Aug 26, 2008 at 5:04 PM, Linus Torvalds [EMAIL PROTECTED] wrote: And embedded people (the ones that might care about 1% code size) are the ones that would also want smaller stacks even more! This is something I never

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-27 Thread Alan Cox
You have a good point that aiming at 4kB makes 8kB a very safe choice. Not really no - we use separate IRQ stacks in 4K but not 8K mode on x86-32. That means you've actually got no more space if you are unlucky with the timing of events. The 8K mode is merely harder to debug. If 4K stacks

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-27 Thread Bernd Petrovitsch
On Tue, 2008-08-26 at 22:16 -0400, Parag Warudkar wrote: [...] Well, sure - but the industry as a whole seems to have gone the other The industry as a whole doesn't exist on that low level. You can't compare the laptop and/or desktop computer market (where one may buy today hardware that runs

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-27 Thread David Miller
From: Nick Piggin [EMAIL PROTECTED] Date: Wed, 27 Aug 2008 17:47:14 +1000 Yeah, I see. That's stupid isn't it? (Well, I guess it was completely sane when cpumasks were word sized ;)) Hopefully that accounts for a significant chunk... There is a lot of indirect costs that are hard to see as

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-27 Thread Alan Cox
What about deep call chains? The problem with the uptake of 4K stacks seems to be that is not reliably provable that it will work under all circumstances. On x86-32 with 8K stacks your IRQ paths share them so that is even harder to prove (not that you can prove any of them) and the bugs are

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-27 Thread Adrian Bunk
On Tue, Aug 26, 2008 at 05:28:37PM -0700, Linus Torvalds wrote: On Wed, 27 Aug 2008, Adrian Bunk wrote: When did we get callpaths like like nfs+xfs+md+scsi reliably working with 4kB stacks on x86-32? XFS may never have been usable, but the rest, sure. And you seem to be making

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-27 Thread Parag Warudkar
On Wed, Aug 27, 2008 at 4:25 AM, Alan Cox [EMAIL PROTECTED] wrote: You have a good point that aiming at 4kB makes 8kB a very safe choice. Not really no - we use separate IRQ stacks in 4K but not 8K mode on x86-32. That means you've actually got no more space if you are unlucky with the timing

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-27 Thread Parag Warudkar
On Wed, Aug 27, 2008 at 5:00 AM, Bernd Petrovitsch [EMAIL PROTECTED] wrote: They probably gave the idea pretty soon because you need to rework/improve large parts of the kernel + drivers (and that has two major problems - it consumes a lot of man power for no new features and everything must

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-27 Thread Bernd Petrovitsch
On Wed, 2008-08-27 at 08:56 -0400, Parag Warudkar wrote: On Wed, Aug 27, 2008 at 5:00 AM, Bernd Petrovitsch [EMAIL PROTECTED] wrote: They probably gave the idea pretty soon because you need to rework/improve large parts of the kernel + drivers (and that has two major problems - it consumes

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-27 Thread Alan Cox
By your logic though, XFS on x86 should work fine with 4K stacks - many will attest that it does not and blows up due to stack issues. I have first hand experiences of things blowing up with deep call chains when using 4K stacks where 8K worked just fine on same workload. So there is

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-27 Thread Mike Travis
David Miller wrote: From: Nick Piggin [EMAIL PROTECTED] Date: Wed, 27 Aug 2008 16:54:32 +1000 5% is a pretty nasty performance hit... what sort of benchmarks are we talking about here? I just made some pretty crazy changes to the VM to get only around 5 or so % performance improvement in

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-27 Thread Mike Travis
David Miller wrote: From: Nick Piggin [EMAIL PROTECTED] Date: Wed, 27 Aug 2008 17:47:14 +1000 Yeah, I see. That's stupid isn't it? (Well, I guess it was completely sane when cpumasks were word sized ;)) Hopefully that accounts for a significant chunk... There is a lot of indirect costs

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-27 Thread Linus Torvalds
On Wed, 27 Aug 2008, Paul Mackerras wrote: I think your memory is failing you. In 2.4 and earlier, the kernel stack was 8kB minus the size of the task_struct, which sat at the start of the 8kB. Yup, you're right. Linus -- To unsubscribe from this list: send the

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-27 Thread Jamie Lokier
Bernd Petrovitsch wrote: If you develop an embedded system (which is partly system integration of existing apps) to be installed in the field, you don't have that many conceivable work loads compared to a desktop/server system. And you have a fixed list of drivers and applications. Hah! Not

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-27 Thread Paul Mundt
On Wed, Aug 27, 2008 at 02:58:30PM +0300, Adrian Bunk wrote: On Tue, Aug 26, 2008 at 05:28:37PM -0700, Linus Torvalds wrote: On Wed, 27 Aug 2008, Adrian Bunk wrote: When did we get callpaths like like nfs+xfs+md+scsi reliably working with 4kB stacks on x86-32? XFS may never have

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-27 Thread Jamie Lokier
Linus Torvalds wrote: Most LOCs of the kernel are not written by people like you or Al Viro or David Miller, and the average kernel developer is unlikely to do it as good as gcc. Sure. But we do have tools. We do have checkstack.pl, it's just that it hasn't been an issue in a long

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-27 Thread Parag Warudkar
On Wed, Aug 27, 2008 at 9:21 AM, Alan Cox [EMAIL PROTECTED] wrote: By your logic though, XFS on x86 should work fine with 4K stacks - many will attest that it does not and blows up due to stack issues. I have first hand experiences of things blowing up with deep call chains when using 4K

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-27 Thread Bernd Petrovitsch
On Wed, 2008-08-27 at 16:48 +0100, Jamie Lokier wrote: Bernd Petrovitsch wrote: If you develop an embedded system (which is partly system integration of existing apps) to be installed in the field, you don't have that many conceivable work loads compared to a desktop/server system. And you

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-27 Thread Jamie Lokier
Bernd Petrovitsch wrote: 32MB no-MMU ARM boards which people run new things and attach new devices to rather often - without making new hardware. Volume's too low per individual application to get new hardware designed and made. Yes, you may have several products on the same hardware

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-27 Thread Bernd Petrovitsch
On Mit, 2008-08-27 at 18:51 +0100, Jamie Lokier wrote: Bernd Petrovitsch wrote: [...] It is, but the idea that small embedded systems go through a 'all components are known, drivers are known, test and if it passes it's shippable' does not always apply. Not always but often enough. And yes,

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-27 Thread Paul Mundt
On Wed, Aug 27, 2008 at 05:46:05PM -0700, David Miller wrote: From: Paul Mundt [EMAIL PROTECTED] Date: Thu, 28 Aug 2008 09:32:13 +0900 On Wed, Aug 27, 2008 at 08:35:44PM +0300, Adrian Bunk wrote: CONFIG_DEBUG_STACKOVERFLOW should give you the same information, and if wanted with an

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-27 Thread Greg Ungerer
Paul Mundt wrote: On Wed, Aug 27, 2008 at 02:58:30PM +0300, Adrian Bunk wrote: On Tue, Aug 26, 2008 at 05:28:37PM -0700, Linus Torvalds wrote: On Wed, 27 Aug 2008, Adrian Bunk wrote: When did we get callpaths like like nfs+xfs+md+scsi reliably working with 4kB stacks on x86-32? XFS may

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-26 Thread Ingo Molnar
* Linus Torvalds [EMAIL PROTECTED] wrote: On Mon, 25 Aug 2008, Linus Torvalds wrote: checkstack.pl shows these things as the top problems: 0x80266234 smp_call_function_mask [vmlinux]:2736 0x80234747 __build_sched_domains [vmlinux]: 2232

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-26 Thread David Miller
From: Ingo Molnar [EMAIL PROTECTED] Date: Tue, 26 Aug 2008 09:22:20 +0200 And i guess the next generation of 4K CPUs support should just get away from cpumask_t-on-kernel-stack model altogether, as the current model is not maintainable. We tried the on-kernel-stack variant, and it really

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-26 Thread Yinghai Lu
On Tue, Aug 26, 2008 at 12:53 AM, Ingo Molnar [EMAIL PROTECTED] wrote: * David Miller [EMAIL PROTECTED] wrote: From: Ingo Molnar [EMAIL PROTECTED] Date: Tue, 26 Aug 2008 09:22:20 +0200 And i guess the next generation of 4K CPUs support should just get away from cpumask_t-on-kernel-stack

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-26 Thread Linus Torvalds
On Tue, 26 Aug 2008, Yinghai Lu wrote: wonder if could use unsigned long * directly. I would actually suggest something like this: - we continue to have a magic cpumask_t. - we do different cases for big and small NR_CPUS: #if NR_CPUS = BITS_PER_LONG /* * Make

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-26 Thread Yinghai Lu
On Tue, Aug 26, 2008 at 9:51 AM, Linus Torvalds [EMAIL PROTECTED] wrote: On Tue, 26 Aug 2008, Yinghai Lu wrote: wonder if could use unsigned long * directly. I would actually suggest something like this: - we continue to have a magic cpumask_t. - we do different cases for big and

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-26 Thread Linus Torvalds
On Tue, 26 Aug 2008, Rusty Russell wrote: Your workaround is very random, and that scares me. I think a huge number of CPUs needs a real solution (an actual cpumask allocator, then do something clever if we come across an actual fastpath). The thing is, the inlining thing is a separate

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-26 Thread Adrian Bunk
On Tue, Aug 26, 2008 at 10:35:05AM -0700, Linus Torvalds wrote: On Tue, 26 Aug 2008, Rusty Russell wrote: Your workaround is very random, and that scares me. I think a huge number of CPUs needs a real solution (an actual cpumask allocator, then do something clever if we come

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-26 Thread Linus Torvalds
On Tue, 26 Aug 2008, Adrian Bunk wrote: A debugging option (for better traces) to disallow gcc some inlining might make sense (and might even make sense for distributions to enable in their kernels), but when you go to use cases that require really small kernels the cost is too high.

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-26 Thread Linus Torvalds
On Tue, 26 Aug 2008, Adrian Bunk wrote: I added -fno-inline-functions-called-once -fno-early-inlining to KBUILD_CFLAGS, and (with gcc 4.3) that increased the size of my kernel image by 2%. Btw, did you check with just -fno-inline-functions-called-once? The -fearly-inlining decisions

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-26 Thread Mike Travis
Linus Torvalds wrote: On Mon, 25 Aug 2008, Linus Torvalds wrote: checkstack.pl shows these things as the top problems: 0x80266234 smp_call_function_mask [vmlinux]:2736 0x80234747 __build_sched_domains [vmlinux]: 2232 0x8023523f

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-26 Thread Jamie Lokier
Linus Torvalds wrote: The inline-functions-called-once thing is what causes even big functions to be inlined, and that's where you find the big downsides too (eg the stack usage). That's a bit bizarre, though, isn't it? A function which is only called from one place should, if everything

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-26 Thread Mike Travis
Ingo Molnar wrote: * Linus Torvalds [EMAIL PROTECTED] wrote: On Mon, 25 Aug 2008, Linus Torvalds wrote: checkstack.pl shows these things as the top problems: 0x80266234 smp_call_function_mask [vmlinux]:2736 0x80234747 __build_sched_domains [vmlinux]: 2232

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-26 Thread Linus Torvalds
On Tue, 26 Aug 2008, Mike Travis wrote: The need to allow distros to set NR_CPUS=4096 (and NODES_SHIFT=9) is critical to our upcoming SGI systems using what we have been calling UV. That's fine. You can do it. The default kernel will not, because it's clearly not safe. I really don't

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-26 Thread Linus Torvalds
On Tue, 26 Aug 2008, Jamie Lokier wrote: A function which is only called from one place should, if everything made sense, _never_ use more stack through being inlined. But that's simply not true. See the whole discussion. The problem is that if you inline that function, the stack usage of

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-26 Thread Mike Travis
Linus Torvalds wrote: On Tue, 26 Aug 2008, Mike Travis wrote: The need to allow distros to set NR_CPUS=4096 (and NODES_SHIFT=9) is critical to our upcoming SGI systems using what we have been calling UV. That's fine. You can do it. The default kernel will not, because it's clearly not

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-26 Thread Dave Jones
On Tue, Aug 26, 2008 at 12:09:46PM -0700, Linus Torvalds wrote: If you want the default kernel to support 4k cores, we'll need to fix the stack usage. I don't think that is impossible, but IT IS NOT GOING TO HAPPEN for 2.6.27. And quite frankly, if some vendor like RedHat enables

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-26 Thread Linus Torvalds
On Tue, 26 Aug 2008, Mike Travis wrote: I would be most interested in any tools to analyze call-trees and accumulated stack usages. My current method of using kdb is really time consuming. Well, even just scripts/checkstack.pl is quite relevant. The fact is, anything with a stack

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-26 Thread Jeff Garzik
Linus Torvalds wrote: The downsides of inlining are big enough from both a debugging and a real code generation angle (eg stack usage like this), that the upsides (_somesimes_ smaller kernel, possibly slightly faster code) simply aren't relevant. So the noinline was random, yes, but this is

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-26 Thread Mike Travis
Dave Jones wrote: ... But yes, for this to be even remotely feasible, there has to be a negligable performance cost associated with it, which right now, we clearly don't have. Given that the number of people running 4096 CPU boxes even in a few years time will still be tiny, punishing the

Re: e1000 horridness (was Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected)

2008-08-26 Thread Kok, Auke
Linus Torvalds wrote: On Tue, 26 Aug 2008, Jeff Garzik wrote: e1000_check_options builds a struct (singular) on the stack, really... struct e1000_option is reasonably small. No it doesn't. Look a bit more closely. It builds a struct (singular) MANY MANY times. It also then builds up

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-26 Thread Adrian Bunk
On Tue, Aug 26, 2008 at 11:40:10AM -0700, Linus Torvalds wrote: On Tue, 26 Aug 2008, Adrian Bunk wrote: A debugging option (for better traces) to disallow gcc some inlining might make sense (and might even make sense for distributions to enable in their kernels), but when you go to

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-26 Thread Linus Torvalds
On Tue, 26 Aug 2008, Adrian Bunk wrote: I had in mind that we anyway have to support it for tiny kernels. I actually don't think that is true. If we really were to decide to be stricter about it, and it makes a big size difference, we can probably also add a tool to warn about functions

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-26 Thread David Miller
From: Mike Travis [EMAIL PROTECTED] Date: Tue, 26 Aug 2008 12:06:18 -0700 David Miller wrote: The only case that didn't work was due to a limitation in arch interfaces for the new generic smp_call_function() code. It passes a cpumask_t instead of a pointer to one via

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-26 Thread Adrian Bunk
On Tue, Aug 26, 2008 at 11:47:01AM -0700, Linus Torvalds wrote: On Tue, 26 Aug 2008, Adrian Bunk wrote: I added -fno-inline-functions-called-once -fno-early-inlining to KBUILD_CFLAGS, and (with gcc 4.3) that increased the size of my kernel image by 2%. Btw, did you check with

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-26 Thread Linus Torvalds
On Tue, 26 Aug 2008, Adrian Bunk wrote: If you think we have too many stacksize problems I'd suggest to consider removing the choice of 4k stacks on i386, sh and m68knommu instead of using -fno-inline-functions-called-once: Don't be silly. That makes the problem _worse_. We're much

Re: e1000 horridness (was Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected)

2008-08-26 Thread Jeff Kirsher
On Tue, Aug 26, 2008 at 1:14 PM, Kok, Auke [EMAIL PROTECTED] wrote: Linus Torvalds wrote: On Tue, 26 Aug 2008, Jeff Garzik wrote: e1000_check_options builds a struct (singular) on the stack, really... struct e1000_option is reasonably small. No it doesn't. Look a bit more closely. It

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-26 Thread Parag Warudkar
On Tue, Aug 26, 2008 at 5:04 PM, Linus Torvalds [EMAIL PROTECTED] wrote: And embedded people (the ones that might care about 1% code size) are the ones that would also want smaller stacks even more! This is something I never understood - embedded devices are not going to run more than a few

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-26 Thread David VomLehn
Parag Warudkar wrote: On Tue, Aug 26, 2008 at 5:04 PM, Linus Torvalds [EMAIL PROTECTED] wrote: And embedded people (the ones that might care about 1% code size) are the ones that would also want smaller stacks even more! This is something I never understood - embedded devices are not going

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-26 Thread Adrian Bunk
On Tue, Aug 26, 2008 at 04:00:33PM -0700, David VomLehn wrote: Parag Warudkar wrote: On Tue, Aug 26, 2008 at 5:04 PM, Linus Torvalds [EMAIL PROTECTED] wrote: And embedded people (the ones that might care about 1% code size) are the ones that would also want smaller stacks even more! This

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-26 Thread Linus Torvalds
On Wed, 27 Aug 2008, Adrian Bunk wrote: We're much better off with a 1% code-size reduction than forcing big stacks on people. The 4kB stack option is also a good way of saying if it works with this, then 8kB is certainly safe. You implicitely assume both would solve the same

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-26 Thread Adrian Bunk
On Tue, Aug 26, 2008 at 04:51:52PM -0700, Linus Torvalds wrote: On Wed, 27 Aug 2008, Adrian Bunk wrote: We're much better off with a 1% code-size reduction than forcing big stacks on people. The 4kB stack option is also a good way of saying if it works with this, then 8kB

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-26 Thread Linus Torvalds
On Wed, 27 Aug 2008, Adrian Bunk wrote: When did we get callpaths like like nfs+xfs+md+scsi reliably working with 4kB stacks on x86-32? XFS may never have been usable, but the rest, sure. And you seem to be making this whole argument an excuse to SUCK, adn an excuse to let gcc crap even

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-26 Thread Parag Warudkar
On Tue, Aug 26, 2008 at 8:53 PM, Greg Ungerer [EMAIL PROTECTED] wrote: I have some simple devices (network access/routers) with 8MB of RAM, at power up not really being configured to do anything running 25 processes. (Heck there is over 10 kernel processes running!). Configure some interfaces

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-26 Thread Linus Torvalds
On Tue, 26 Aug 2008, Parag Warudkar wrote: And although you said in your later reply that Linux x86 with 4K stacks should be more than usable - my experiences running a untainted desktop/file server with 4K stack have been always disastrous XFS or not. It _might_ work for some well defined

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-26 Thread Linus Torvalds
On Tue, 26 Aug 2008, Parag Warudkar wrote: What about deep call chains? The problem with the uptake of 4K stacks seems to be that is not reliably provable that it will work under all circumstances. Umm. Neither is 8k stacks. Nobody proved anything. But yes, some subsystems have insanely

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-25 Thread Alan D. Brunelle
Linus Torvalds wrote: On Sat, 23 Aug 2008, Linus Torvalds wrote: This one makes no sense. It's triggering a BUG_ON(in_interrupt()), but then the call chain shows that there is no interrupt going on. Ahh, later in that thread there's another totally unrelated oops in

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-25 Thread Alan D. Brunelle
Arjan van de Ven wrote: Wonder what gcc is in use? (newer ones tend to be a ton better... but maybe Alex is using a really old one) I'm running Ubuntu 8.04 w/ gcc: gcc (GCC) 4.2.3 (Ubuntu 4.2.3-2ubuntu7) Alan -- To unsubscribe from this list: send the line unsubscribe kernel-testers in

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-25 Thread Alan D. Brunelle
Before adding any more debugging, this is the status of my kernel boots: 3 times in a row w/ this same error. (Primary problem is the same, secondary stacks differ of course.) Alan Loading, please [6.482953] busybox used greatest stack depth: 4840 bytes left wait... [6.521876]

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-25 Thread Alan D. Brunelle
Linus Torvalds wrote: On Sat, 23 Aug 2008, Linus Torvalds wrote: This one makes no sense. It's triggering a BUG_ON(in_interrupt()), but then the call chain shows that there is no interrupt going on. Ahh, later in that thread there's another totally unrelated oops in

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-25 Thread Alan D. Brunelle
Adding in SLUB debugging doesn't show anything new (I think). Example boot log (w/ initcall_debug enabled) is at: http://free.linux.hp.com/~adb/bug.11342/prob5.txt This has happened 3 times in a row as well. Whilst this is being looked at, I'm going to fast-forward ahead to the latest in Linus'

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-25 Thread Alan D. Brunelle
I built a kernel @ commit 83097aca8567a0bd593534853b71fe0fa9a75d69 Author: Arjan van de Ven [EMAIL PROTECTED] Date: Sat Aug 23 21:45:21 2008 -0700 And it fails like the others do o http://free.linux.hp.com/~adb/bug.11342/prob6.txt SMP_DEBUG_PAGEALLOC o

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-25 Thread Linus Torvalds
On Mon, 25 Aug 2008, Alan D. Brunelle wrote: Before adding any more debugging, this is the status of my kernel boots: 3 times in a row w/ this same error. (Primary problem is the same, secondary stacks differ of course.) Ok, so I took a closer look, and the oops really is suggestive.. [

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-25 Thread Linus Torvalds
On Mon, 25 Aug 2008, Alan D. Brunelle wrote: With /just/ DEBUG_PAGE_ALLOC defined, I have seen two general panic types: o A new double fault w/ SMP_DEBUG_PAGEALLOC problem (prob4.txt) Yeah, that's a stack overflow. Confirmed. Linus -- To unsubscribe from this

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-25 Thread Linus Torvalds
On Mon, 25 Aug 2008, Linus Torvalds wrote: Could you make your kernel image available somewhere, and we can take a look at it? Some versions of gcc are total pigs when it comes to stack usage, and your exact configuration matters too. But yes, module loading is a bad case, for me

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-25 Thread Alan D. Brunelle
Linus Torvalds wrote: On Mon, 25 Aug 2008, Linus Torvalds wrote: Could you make your kernel image available somewhere, and we can take a look at it? Some versions of gcc are total pigs when it comes to stack usage, and your exact configuration matters too. But yes, module loading is a

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-25 Thread Linus Torvalds
On Mon, 25 Aug 2008, Alan D. Brunelle wrote: Mine has: Dump of assembler code for function sys_init_module: 0x802688c4 sys_init_module+4: sub$0x1c0,%rsp so 448 bytes. Yeah, your build seems to have consistently bigger stack usage, and that may be due to some config

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-25 Thread Arjan van de Ven
Linus Torvalds wrote: On Mon, 25 Aug 2008, Alan D. Brunelle wrote: Mine has: Dump of assembler code for function sys_init_module: 0x802688c4 sys_init_module+4: sub$0x1c0,%rsp so 448 bytes. Yeah, your build seems to have consistently bigger stack usage, and that may be due to

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-25 Thread Linus Torvalds
On Mon, 25 Aug 2008, Linus Torvalds wrote: But I'll look at your vmlinux, see what stands out. Oops. I already see the problem. Your .config has soem _huge_ CPU count, doesn't it? checkstack.pl shows these things as the top problems: 0x80266234 smp_call_function_mask

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-25 Thread Linus Torvalds
On Mon, 25 Aug 2008, Linus Torvalds wrote: checkstack.pl shows these things as the top problems: 0x80266234 smp_call_function_mask [vmlinux]:2736 0x80234747 __build_sched_domains [vmlinux]: 2232 0x8023523f __build_sched_domains [vmlinux]:

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-25 Thread Christoph Lameter
Alan D. Brunelle wrote: I think you're right: the kernel as a whole may not be ready for 4,096 CPUs apparently... Mike has been working diligently on getting all these cpumasks off the stack for the last months and has created an infrastructure to do this. So I think we are close. It might

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-23 Thread Linus Torvalds
On Sat, 23 Aug 2008, Rafael J. Wysocki wrote: The following bug entry is on the current list of known regressions from 2.6.26. Please verify if it still should be listed and let me know (either way). Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11342 Subject

Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected

2008-08-23 Thread Linus Torvalds
On Sat, 23 Aug 2008, Linus Torvalds wrote: This one makes no sense. It's triggering a BUG_ON(in_interrupt()), but then the call chain shows that there is no interrupt going on. Ahh, later in that thread there's another totally unrelated oops in debug_mutex_add_waiter(). I'd guess that it