date:20070320

[PATCH] SLAB : Use num_possible_cpus() in enable_cpucache()

2007-03-20 Thread Eric Dumazet


The existing comment in mm/slab.c is *perfect*, so I reproduce it :

/*
 * CPU bound tasks (e.g. network routing) can exhibit cpu bound
 * allocation behaviour: Most allocs on one cpu, most free operations
 * on another cpu. For these cases, an efficient object passing between
 * cpus is necessary. This is provided by a shared array. The array
 * replaces Bonwick's magazine layer.
 * On uniprocessor, it's functionally equivalent (but less efficient)
 * to a larger limit. Thus disabled by default.
 */

As most shiped linux kernels are now compiled with CONFIG_SMP, there is no way 
a preprocessor #if can detect if the machine is UP or SMP. Better to use 
num_possible_cpus().


This means on UP we allocate a 'size=0 shared array', to be more efficient.

Another patch can later avoid the allocations of 'empty shared arrays', to 
save some memory.


Signed-off-by: Eric Dumazet <[EMAIL PROTECTED]>
diff --git a/mm/slab.c b/mm/slab.c
index 57f7aa4..a69d0a5 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3975,10 +3975,8 @@ static int enable_cpucache(struct kmem_c
 * to a larger limit. Thus disabled by default.
 */
shared = 0;
-#ifdef CONFIG_SMP
-   if (cachep->buffer_size <= PAGE_SIZE)
+   if (cachep->buffer_size <= PAGE_SIZE && num_possible_cpus() > 1)
shared = 8;
-#endif
 
 #if DEBUG
/*

Re: AIO, FIO and Threads ...

2007-03-20 Thread Davide Libenzi

On Tue, 20 Mar 2007, Davide Libenzi wrote:

> 
> I was looking at Jens FIO stuff, and I decided to cook a quick patch for 
> FIO to support GUASI (Generic Userspace Asyncronous Syscall Interface):
> 
> http://www.xmailserver.org/guasi-lib.html
> 
> I then ran a few tests on my Dual Opteron 252 with SATA drives (sata_nv) 
> and 8GB of RAM.
> Mind that I'm not FIO expert, like at all, but I got some interesting 
> results when comparing GUASI with libaio at 8/1000/1 depths.
> If I read those result correctly (Jens may help), GUASI output is more 
> then double the libaio one.
> Lots of context switches, yes. But the throughput looks like 2+ times.
> Can someone try to repeat the measures and/or spot the error?
> Or tell me which other tests to run?
> This is kinda a suprise for me ...

Tests with block sizes bigger than 4KB bring libaio performance close to 
GUASI, but not quite:

http://www.xmailserver.org/guasi-libaio-fio-results-1.txt

I dropped the last FIO+GUASI patch here:

http://www.xmailserver.org/fio-guasi-0.5.diff

And Jens FIO is here:

http://brick.kernel.dk/snaps/



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.

2007-03-20 Thread Nick Piggin


William Lee Irwin III wrote:

William Lee Irwin III wrote:


ISTR potential ppc64 users coming out of the woodwork for something I
didn't recognize the name of, but I may be confusing that with your
patch. I can implement additional users (and useful ones at that)
needing this in particular if desired.



On Wed, Mar 21, 2007 at 04:07:43PM +1100, Nick Piggin wrote:


Yes I would be interested in seeing useful additional users of this
that cannot use our regular virtual memory, before making it a general
thing.
I just don't want to see proliferation of these things, if possible.



I'm tied up elsewhere so I won't get to it in a timely fashion. Maybe
in a few weeks I can start up on the first two of the bunch.


Care to give us a hint? :)



William Lee Irwin III wrote:


Two fault handling methods callbacks raise an eyebrow over here at least.
I was vaguely hoping for unification of the fault handling callbacks.



On Wed, Mar 21, 2007 at 04:07:43PM +1100, Nick Piggin wrote:

I don't know if it would be so clean to do that as they are at different 
levels.
Adam's fault is before the VM translation (and bypasses it), and mine is 
after.



Not much of a VM translation; it's just a lookup through the software
mocked-up structures on everything save i386, x86_64, and some m68k where
they're the same thing only with hardware walkers (ISTR ia64's being
firmware a la Alpha despite the "HPW" name, though I could be wrong)


Well the vma+pagetables *are* our VM translation data structure. It is
a good data structure. The Gelato/UNSW guys experimenting with changing
this have basically said they haven't yet got anything that beats it.

I would be opposed to anything that bypasses that unless a) it is not
applicable to the VM as a whole, and b) it is really worth it
(hugepages was a reasonable exception).



reliant on them. The drivers/etc. could just as easily use helper
functions to carry out the lookup, thereby accomplishing the
unification. There's nothing particularly fundamental about a pte
lookup.


Yeah you could, but it looks back to front to me.

The VM tells the filesystem that the machine took a fault at virtual
address X, then the filesystem asks the VM what pgoff that is, then
tells the VM to install the corresponding page to vaddr X.

With my ->fault, the VM asks the filesystem to give the page that
corresponds to vaddr X, then installs it into that vaddr.



Normal arches that do software TLB refill could just as easily
consult the radix trees dangled off struct address_space or any old
data structure floating around the kernel with enough information to
translate user virtual addresses to the physical addresses they need to
fill the TLB with, and there are other kernels that literally do things
like that.


Sure it *could* be done, but it may not be very nice, given Linux's
design. And you definitely need _something_ other than just the
pagecache radix-tree, because the VM needs to know who maps the page.

So if, for your backing store, you use a small hash table and evict old
entries like powerpc, you'll constantly be faulting in and out pages
from the VM's high level view of the address space. That isn't a really
cheap operation. It takes at least:

read_lock_irq(mapping->tree_lock);
radix_tree_lookup()
read_unlock_irq(mapping->tree_lock);
lock_page()
atomic_add(page->_count)
atomic_add(page->_mapcount)
unlock_page()

atomic_add_negative(page->_mapcount)
atomic_dec_and_test(page->_count)

Compared to our current page table walk which is just a single locked
op + barrier for the spinlock + radix tree walk.


If you had a very large hash table (ia64 long mode, maybe?), then you
may have slightly fewer high level faults, but range based operations
are going to take a whole lot of cache misses, aren't they? Especially
for small processes.

Not that I wouldn't be happy to be proven wrong, but I don't think it
should be something that sneaks in under these pagetable operations.
IMO.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/4] i386 GDT cleanups: cleanup GDT Access

2007-03-20 Thread Rusty Russell

Now we have an explicit per-cpu GDT variable, we don't need to keep
the descriptors around to use them to find the GDT: expose cpu_gdt
directly.

We could go further and make load_gdt() pack the descriptor for us, or
even assume it means "load the current cpu's GDT" which is what it
always does.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
---
 arch/i386/kernel/cpu/common.c |4 +---
 arch/i386/kernel/efi.c|   18 +-
 arch/i386/kernel/entry.S  |3 +--
 arch/i386/kernel/smpboot.c|   12 ++--
 arch/i386/kernel/traps.c  |4 +---
 include/asm-i386/desc.h   |   15 ++-
 6 files changed, 24 insertions(+), 32 deletions(-)

diff -r 0714eeaace72 arch/i386/kernel/cpu/common.c
--- a/arch/i386/kernel/cpu/common.c Wed Mar 21 15:55:51 2007 +1100
+++ b/arch/i386/kernel/cpu/common.c Wed Mar 21 16:02:02 2007 +1100
@@ -22,9 +22,6 @@
 
 #include "cpu.h"
 
-DEFINE_PER_CPU(struct Xgt_desc_struct, cpu_gdt_descr);
-EXPORT_PER_CPU_SYMBOL(cpu_gdt_descr);
-
 DEFINE_PER_CPU(struct desc_struct, cpu_gdt[GDT_ENTRIES]) = {
[GDT_ENTRY_KERNEL_CS] = { 0x, 0x00cf9a00 },
[GDT_ENTRY_KERNEL_DS] = { 0x, 0x00cf9200 },
@@ -52,6 +49,7 @@ DEFINE_PER_CPU(struct desc_struct, cpu_g
[GDT_ENTRY_ESPFIX_SS] = { 0x, 0x00c09200 },
[GDT_ENTRY_PDA] = { 0x, 0x00c09200 }, /* set in setup_pda */
 };
+EXPORT_PER_CPU_SYMBOL_GPL(cpu_gdt);
 
 DEFINE_PER_CPU(struct i386_pda, _cpu_pda);
 EXPORT_PER_CPU_SYMBOL(_cpu_pda);
diff -r 0714eeaace72 arch/i386/kernel/efi.c
--- a/arch/i386/kernel/efi.cWed Mar 21 15:55:51 2007 +1100
+++ b/arch/i386/kernel/efi.cWed Mar 21 15:56:22 2007 +1100
@@ -69,12 +69,10 @@ static void efi_call_phys_prelog(void) _
 {
unsigned long cr4;
unsigned long temp;
-   struct Xgt_desc_struct *cpu_gdt_descr;
+   struct Xgt_desc_struct gdt_descr;
 
spin_lock(_rt_lock);
local_irq_save(efi_rt_eflags);
-
-   cpu_gdt_descr = _cpu(cpu_gdt_descr, 0);
 
/*
 * If I don't have PSE, I should just duplicate two entries in page
@@ -105,17 +103,19 @@ static void efi_call_phys_prelog(void) _
 */
local_flush_tlb();
 
-   cpu_gdt_descr->address = __pa(cpu_gdt_descr->address);
-   load_gdt(cpu_gdt_descr);
+   gdt_descr.address = __pa(get_cpu_gdt_table(0));
+   gdt_descr.size = GDT_SIZE - 1;
+   load_gdt(_descr);
 }
 
 static void efi_call_phys_epilog(void) __releases(efi_rt_lock)
 {
unsigned long cr4;
-   struct Xgt_desc_struct *cpu_gdt_descr = _cpu(cpu_gdt_descr, 0);
-
-   cpu_gdt_descr->address = (unsigned long)__va(cpu_gdt_descr->address);
-   load_gdt(cpu_gdt_descr);
+   struct Xgt_desc_struct gdt_descr;
+
+   gdt_descr.address = (unsigned long)get_cpu_gdt_table(0);
+   gdt_descr.size = GDT_SIZE - 1;
+   load_gdt(_descr);
 
cr4 = read_cr4();
 
diff -r 0714eeaace72 arch/i386/kernel/entry.S
--- a/arch/i386/kernel/entry.S  Wed Mar 21 15:55:51 2007 +1100
+++ b/arch/i386/kernel/entry.S  Wed Mar 21 16:02:02 2007 +1100
@@ -558,8 +558,7 @@ END(syscall_badsys)
 #define FIXUP_ESPFIX_STACK \
/* since we are on a wrong stack, we cant make it a C code :( */ \
movl %fs:PDA_cpu, %ebx; \
-   PER_CPU(cpu_gdt_descr, %ebx); \
-   movl GDS_address(%ebx), %ebx; \
+   PER_CPU(cpu_gdt, %ebx); \
GET_DESC_BASE(GDT_ENTRY_ESPFIX_SS, %ebx, %eax, %ax, %al, %ah); \
addl %esp, %eax; \
pushl $__KERNEL_DS; \
diff -r 0714eeaace72 arch/i386/kernel/smpboot.c
--- a/arch/i386/kernel/smpboot.cWed Mar 21 15:55:51 2007 +1100
+++ b/arch/i386/kernel/smpboot.cWed Mar 21 16:02:02 2007 +1100
@@ -786,12 +786,8 @@ static inline struct task_struct * alloc
secondary which will soon come up. */
 static __cpuinit void init_gdt(int cpu, struct task_struct *idle)
 {
-   struct Xgt_desc_struct *cpu_gdt_descr = _cpu(cpu_gdt_descr, cpu);
-   struct desc_struct *gdt = per_cpu(cpu_gdt, cpu);
+   struct desc_struct *gdt = get_cpu_gdt_table(cpu);
struct i386_pda *pda = _cpu(_cpu_pda, cpu);
-
-   cpu_gdt_descr->address = (unsigned long)gdt;
-   cpu_gdt_descr->size = GDT_SIZE - 1;
 
pack_descriptor((u32 *)[GDT_ENTRY_PDA].a,
(u32 *)[GDT_ENTRY_PDA].b,
@@ -1187,7 +1183,11 @@ void __init smp_prepare_cpus(unsigned in
  * it's on the real one. */
 static inline void switch_to_new_gdt(void)
 {
-   load_gdt(_cpu(cpu_gdt_descr, smp_processor_id()));
+   struct Xgt_desc_struct gdt_descr;
+
+   gdt_descr.address = (long)get_cpu_gdt_table(smp_processor_id());
+   gdt_descr.size = GDT_SIZE - 1;
+   load_gdt(_descr);
asm volatile ("mov %0, %%fs" : : "r" (__KERNEL_PDA) : "memory");
 }
 
diff -r 0714eeaace72 arch/i386/kernel/traps.c
--- a/arch/i386/kernel/traps.c  Wed Mar 21 15:55:51 2007 +1100
+++ b/arch/i386/kernel/traps.c  Wed Mar 21 16:02:02 2007 +1100
@@ -1037,9 +1037,7 @@ fastcall unsigned long

[PATCH 7/7] Add trec_snapshot and trec_print_snapshot in panic()

2007-03-20 Thread Wink Saville


Signed-off-by: Wink Saville <[EMAIL PROTECTED]>
---
kernel/panic.c |   12 
1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/kernel/panic.c b/kernel/panic.c
index 623d182..64a047e 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -20,6 +20,10 @@
#include 
#include 

+#ifdef CONFIG_TREC
+#include 
+#endif
+
int panic_on_oops;
int tainted;
static int pause_on_oops;
@@ -66,6 +70,10 @@ NORET_TYPE void panic(const char * fmt, ...)
unsigned long caller = (unsigned long) __builtin_return_address(0);
#endif

+#ifdef CONFIG_TREC
+   trec_snapshot();
+#endif
+
/*
 * It's possible to come here directly from a panic-assertion and not
 * have preempt disabled. Some functions called from here want
@@ -96,6 +104,10 @@ NORET_TYPE void panic(const char * fmt, ...)
smp_send_stop();
#endif

+#ifdef CONFIG_TREC
+   trec_print_snapshot();
+#endif
+
atomic_notifier_call_chain(_notifier_list, 0, buf);

if (!panic_blink)
--
1.5.0.rc2
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: MPT Fusion LSI22320 , Domain validation loops .

2007-03-20 Thread Mr. James W. Laferriere

Hello Eric , Fyi , linux-2.6.21-rc4 + mpt-fusion(*) patches from
Andrew Morton's patch tree . Still gives me the ever looping reset . But I
have just found sometrhing of interest one of the Powersuplies in the cabiinet
'May be' failing . I have to test that to be satisfied that is the case .
I'll report back soon on the PS & please look into this . There is no
reason for the driver to keep a system in loop over a failing drive set .

Tia , JimL

(*)
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc4/2.6.21-rc4-mm1/broken-out/mpt-fusion-handle-pci-layer-error-on-resume.patch
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc4/2.6.21-rc4-mm1/broken-out/mpt-fusion-handle-mpt_resume-failure-while-resuming.patch

On Mon, 19 Mar 2007, Moore, Eric wrote:

On Saturday, March 17, 2007 2:33 PM, James W. Laferriere wrote:

Hello All , I am have been having this problem since I
purchased the
controller and after changing out the disks I thought were
the problem .
I am still getting the continous :

mptscsih: ioc1: attempting task abort! (sc=f7a64500)
scsi 3:0:4:0:
command: Inquiry: 12 00 00 00 60 00
mptbase: Initiating ioc1 recovery
mptscsih: ioc1: task abort: SUCCESS (sc=f7a64500)
target3:0:4: Domain Validation detected failure, dropping back
target3:0:4: Domain Validation skipping write tests
target3:0:4: Ending Domain Validation
target3:0:4: asynchronous
target3:0:5: Beginning Domain Validation
mptscsih: ioc0: attempting target reset! (sc=f7a64380)

The acutual device id's change and the driver
continously resets the
busses & starts all over .

The disks are in a HP DS-SL13R-BA 4354R 14drive ultra3
racKmount cabinet
w/ dualbus & dualps , Which seems to present a ID6 , That
does not show up in
any of the bus scans .

Now I have previously had the same cabinet with 18gb
disks which had the
same problem with this controller . BUT I also have a LSI
Logic / Symbios
Logic 53c1010 66MHz Ultra3 dual SCSI bus Adapter which works
flawlessly with the
18gb disks in this very same cabinet .
The cables for connecting the adapter(s) to tha cabinet
are less than 24
inches in length .

Would anyone please shed some light on what it is I am
doing wrong or
need to do or ? Too have this controller recognise these
disk drives in
this cabinet .

There is a seperate mailing list for scsi releated issues, e.g.
[EMAIL PROTECTED] I've posted a patch to address your issue several times,
however it seems its not been picked up by the scsi subsystem
maintainer. The last time it was posted was here:
http://marc.info/?l=linux-scsi=117089244809072=2 An alternative is
you could obtain our latest drivers from the LSI download site, where
these drivers should have this patch
http://www.lsilogic.com/cm/DownloadSearch.do.

Eric

1 2 3 4 5 6 7 8 9 >

1 - 100 of 851 matches

Mail list logo