[PATCH] SLAB : Use num_possible_cpus() in enable_cpucache()

2007-03-20 Thread Eric Dumazet

The existing comment in mm/slab.c is *perfect*, so I reproduce it :

/*
 * CPU bound tasks (e.g. network routing) can exhibit cpu bound
 * allocation behaviour: Most allocs on one cpu, most free operations
 * on another cpu. For these cases, an efficient object passing between
 * cpus is necessary. This is provided by a shared array. The array
 * replaces Bonwick's magazine layer.
 * On uniprocessor, it's functionally equivalent (but less efficient)
 * to a larger limit. Thus disabled by default.
 */

As most shiped linux kernels are now compiled with CONFIG_SMP, there is no way 
a preprocessor #if can detect if the machine is UP or SMP. Better to use 
num_possible_cpus().


This means on UP we allocate a 'size=0 shared array', to be more efficient.

Another patch can later avoid the allocations of 'empty shared arrays', to 
save some memory.


Signed-off-by: Eric Dumazet <[EMAIL PROTECTED]>
diff --git a/mm/slab.c b/mm/slab.c
index 57f7aa4..a69d0a5 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3975,10 +3975,8 @@ static int enable_cpucache(struct kmem_c
 * to a larger limit. Thus disabled by default.
 */
shared = 0;
-#ifdef CONFIG_SMP
-   if (cachep->buffer_size <= PAGE_SIZE)
+   if (cachep->buffer_size <= PAGE_SIZE && num_possible_cpus() > 1)
shared = 8;
-#endif
 
 #if DEBUG
/*


Re: AIO, FIO and Threads ...

2007-03-20 Thread Davide Libenzi
On Tue, 20 Mar 2007, Davide Libenzi wrote:

> 
> I was looking at Jens FIO stuff, and I decided to cook a quick patch for 
> FIO to support GUASI (Generic Userspace Asyncronous Syscall Interface):
> 
> http://www.xmailserver.org/guasi-lib.html
> 
> I then ran a few tests on my Dual Opteron 252 with SATA drives (sata_nv) 
> and 8GB of RAM.
> Mind that I'm not FIO expert, like at all, but I got some interesting 
> results when comparing GUASI with libaio at 8/1000/1 depths.
> If I read those result correctly (Jens may help), GUASI output is more 
> then double the libaio one.
> Lots of context switches, yes. But the throughput looks like 2+ times.
> Can someone try to repeat the measures and/or spot the error?
> Or tell me which other tests to run?
> This is kinda a suprise for me ...

Tests with block sizes bigger than 4KB bring libaio performance close to 
GUASI, but not quite:

http://www.xmailserver.org/guasi-libaio-fio-results-1.txt

I dropped the last FIO+GUASI patch here:

http://www.xmailserver.org/fio-guasi-0.5.diff

And Jens FIO is here:

http://brick.kernel.dk/snaps/



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.

2007-03-20 Thread Nick Piggin

William Lee Irwin III wrote:

William Lee Irwin III wrote:


ISTR potential ppc64 users coming out of the woodwork for something I
didn't recognize the name of, but I may be confusing that with your
patch. I can implement additional users (and useful ones at that)
needing this in particular if desired.



On Wed, Mar 21, 2007 at 04:07:43PM +1100, Nick Piggin wrote:


Yes I would be interested in seeing useful additional users of this
that cannot use our regular virtual memory, before making it a general
thing.
I just don't want to see proliferation of these things, if possible.



I'm tied up elsewhere so I won't get to it in a timely fashion. Maybe
in a few weeks I can start up on the first two of the bunch.


Care to give us a hint? :)



William Lee Irwin III wrote:


Two fault handling methods callbacks raise an eyebrow over here at least.
I was vaguely hoping for unification of the fault handling callbacks.



On Wed, Mar 21, 2007 at 04:07:43PM +1100, Nick Piggin wrote:

I don't know if it would be so clean to do that as they are at different 
levels.
Adam's fault is before the VM translation (and bypasses it), and mine is 
after.



Not much of a VM translation; it's just a lookup through the software
mocked-up structures on everything save i386, x86_64, and some m68k where
they're the same thing only with hardware walkers (ISTR ia64's being
firmware a la Alpha despite the "HPW" name, though I could be wrong)


Well the vma+pagetables *are* our VM translation data structure. It is
a good data structure. The Gelato/UNSW guys experimenting with changing
this have basically said they haven't yet got anything that beats it.

I would be opposed to anything that bypasses that unless a) it is not
applicable to the VM as a whole, and b) it is really worth it
(hugepages was a reasonable exception).



reliant on them. The drivers/etc. could just as easily use helper
functions to carry out the lookup, thereby accomplishing the
unification. There's nothing particularly fundamental about a pte
lookup.


Yeah you could, but it looks back to front to me.

The VM tells the filesystem that the machine took a fault at virtual
address X, then the filesystem asks the VM what pgoff that is, then
tells the VM to install the corresponding page to vaddr X.

With my ->fault, the VM asks the filesystem to give the page that
corresponds to vaddr X, then installs it into that vaddr.



Normal arches that do software TLB refill could just as easily
consult the radix trees dangled off struct address_space or any old
data structure floating around the kernel with enough information to
translate user virtual addresses to the physical addresses they need to
fill the TLB with, and there are other kernels that literally do things
like that.


Sure it *could* be done, but it may not be very nice, given Linux's
design. And you definitely need _something_ other than just the
pagecache radix-tree, because the VM needs to know who maps the page.

So if, for your backing store, you use a small hash table and evict old
entries like powerpc, you'll constantly be faulting in and out pages
from the VM's high level view of the address space. That isn't a really
cheap operation. It takes at least:

read_lock_irq(mapping->tree_lock);
radix_tree_lookup()
read_unlock_irq(mapping->tree_lock);
lock_page()
atomic_add(page->_count)
atomic_add(page->_mapcount)
unlock_page()

atomic_add_negative(page->_mapcount)
atomic_dec_and_test(page->_count)

Compared to our current page table walk which is just a single locked
op + barrier for the spinlock + radix tree walk.


If you had a very large hash table (ia64 long mode, maybe?), then you
may have slightly fewer high level faults, but range based operations
are going to take a whole lot of cache misses, aren't they? Especially
for small processes.

Not that I wouldn't be happy to be proven wrong, but I don't think it
should be something that sneaks in under these pagetable operations.
IMO.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/4] i386 GDT cleanups: cleanup GDT Access

2007-03-20 Thread Rusty Russell
Now we have an explicit per-cpu GDT variable, we don't need to keep
the descriptors around to use them to find the GDT: expose cpu_gdt
directly.

We could go further and make load_gdt() pack the descriptor for us, or
even assume it means "load the current cpu's GDT" which is what it
always does.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
---
 arch/i386/kernel/cpu/common.c |4 +---
 arch/i386/kernel/efi.c|   18 +-
 arch/i386/kernel/entry.S  |3 +--
 arch/i386/kernel/smpboot.c|   12 ++--
 arch/i386/kernel/traps.c  |4 +---
 include/asm-i386/desc.h   |   15 ++-
 6 files changed, 24 insertions(+), 32 deletions(-)

diff -r 0714eeaace72 arch/i386/kernel/cpu/common.c
--- a/arch/i386/kernel/cpu/common.c Wed Mar 21 15:55:51 2007 +1100
+++ b/arch/i386/kernel/cpu/common.c Wed Mar 21 16:02:02 2007 +1100
@@ -22,9 +22,6 @@
 
 #include "cpu.h"
 
-DEFINE_PER_CPU(struct Xgt_desc_struct, cpu_gdt_descr);
-EXPORT_PER_CPU_SYMBOL(cpu_gdt_descr);
-
 DEFINE_PER_CPU(struct desc_struct, cpu_gdt[GDT_ENTRIES]) = {
[GDT_ENTRY_KERNEL_CS] = { 0x, 0x00cf9a00 },
[GDT_ENTRY_KERNEL_DS] = { 0x, 0x00cf9200 },
@@ -52,6 +49,7 @@ DEFINE_PER_CPU(struct desc_struct, cpu_g
[GDT_ENTRY_ESPFIX_SS] = { 0x, 0x00c09200 },
[GDT_ENTRY_PDA] = { 0x, 0x00c09200 }, /* set in setup_pda */
 };
+EXPORT_PER_CPU_SYMBOL_GPL(cpu_gdt);
 
 DEFINE_PER_CPU(struct i386_pda, _cpu_pda);
 EXPORT_PER_CPU_SYMBOL(_cpu_pda);
diff -r 0714eeaace72 arch/i386/kernel/efi.c
--- a/arch/i386/kernel/efi.cWed Mar 21 15:55:51 2007 +1100
+++ b/arch/i386/kernel/efi.cWed Mar 21 15:56:22 2007 +1100
@@ -69,12 +69,10 @@ static void efi_call_phys_prelog(void) _
 {
unsigned long cr4;
unsigned long temp;
-   struct Xgt_desc_struct *cpu_gdt_descr;
+   struct Xgt_desc_struct gdt_descr;
 
spin_lock(_rt_lock);
local_irq_save(efi_rt_eflags);
-
-   cpu_gdt_descr = _cpu(cpu_gdt_descr, 0);
 
/*
 * If I don't have PSE, I should just duplicate two entries in page
@@ -105,17 +103,19 @@ static void efi_call_phys_prelog(void) _
 */
local_flush_tlb();
 
-   cpu_gdt_descr->address = __pa(cpu_gdt_descr->address);
-   load_gdt(cpu_gdt_descr);
+   gdt_descr.address = __pa(get_cpu_gdt_table(0));
+   gdt_descr.size = GDT_SIZE - 1;
+   load_gdt(_descr);
 }
 
 static void efi_call_phys_epilog(void) __releases(efi_rt_lock)
 {
unsigned long cr4;
-   struct Xgt_desc_struct *cpu_gdt_descr = _cpu(cpu_gdt_descr, 0);
-
-   cpu_gdt_descr->address = (unsigned long)__va(cpu_gdt_descr->address);
-   load_gdt(cpu_gdt_descr);
+   struct Xgt_desc_struct gdt_descr;
+
+   gdt_descr.address = (unsigned long)get_cpu_gdt_table(0);
+   gdt_descr.size = GDT_SIZE - 1;
+   load_gdt(_descr);
 
cr4 = read_cr4();
 
diff -r 0714eeaace72 arch/i386/kernel/entry.S
--- a/arch/i386/kernel/entry.S  Wed Mar 21 15:55:51 2007 +1100
+++ b/arch/i386/kernel/entry.S  Wed Mar 21 16:02:02 2007 +1100
@@ -558,8 +558,7 @@ END(syscall_badsys)
 #define FIXUP_ESPFIX_STACK \
/* since we are on a wrong stack, we cant make it a C code :( */ \
movl %fs:PDA_cpu, %ebx; \
-   PER_CPU(cpu_gdt_descr, %ebx); \
-   movl GDS_address(%ebx), %ebx; \
+   PER_CPU(cpu_gdt, %ebx); \
GET_DESC_BASE(GDT_ENTRY_ESPFIX_SS, %ebx, %eax, %ax, %al, %ah); \
addl %esp, %eax; \
pushl $__KERNEL_DS; \
diff -r 0714eeaace72 arch/i386/kernel/smpboot.c
--- a/arch/i386/kernel/smpboot.cWed Mar 21 15:55:51 2007 +1100
+++ b/arch/i386/kernel/smpboot.cWed Mar 21 16:02:02 2007 +1100
@@ -786,12 +786,8 @@ static inline struct task_struct * alloc
secondary which will soon come up. */
 static __cpuinit void init_gdt(int cpu, struct task_struct *idle)
 {
-   struct Xgt_desc_struct *cpu_gdt_descr = _cpu(cpu_gdt_descr, cpu);
-   struct desc_struct *gdt = per_cpu(cpu_gdt, cpu);
+   struct desc_struct *gdt = get_cpu_gdt_table(cpu);
struct i386_pda *pda = _cpu(_cpu_pda, cpu);
-
-   cpu_gdt_descr->address = (unsigned long)gdt;
-   cpu_gdt_descr->size = GDT_SIZE - 1;
 
pack_descriptor((u32 *)[GDT_ENTRY_PDA].a,
(u32 *)[GDT_ENTRY_PDA].b,
@@ -1187,7 +1183,11 @@ void __init smp_prepare_cpus(unsigned in
  * it's on the real one. */
 static inline void switch_to_new_gdt(void)
 {
-   load_gdt(_cpu(cpu_gdt_descr, smp_processor_id()));
+   struct Xgt_desc_struct gdt_descr;
+
+   gdt_descr.address = (long)get_cpu_gdt_table(smp_processor_id());
+   gdt_descr.size = GDT_SIZE - 1;
+   load_gdt(_descr);
asm volatile ("mov %0, %%fs" : : "r" (__KERNEL_PDA) : "memory");
 }
 
diff -r 0714eeaace72 arch/i386/kernel/traps.c
--- a/arch/i386/kernel/traps.c  Wed Mar 21 15:55:51 2007 +1100
+++ b/arch/i386/kernel/traps.c  Wed Mar 21 16:02:02 2007 +1100
@@ -1037,9 +1037,7 @@ fastcall unsigned long 

[PATCH 7/7] Add trec_snapshot and trec_print_snapshot in panic()

2007-03-20 Thread Wink Saville

Signed-off-by: Wink Saville <[EMAIL PROTECTED]>
---
kernel/panic.c |   12 
1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/kernel/panic.c b/kernel/panic.c
index 623d182..64a047e 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -20,6 +20,10 @@
#include 
#include 

+#ifdef CONFIG_TREC
+#include 
+#endif
+
int panic_on_oops;
int tainted;
static int pause_on_oops;
@@ -66,6 +70,10 @@ NORET_TYPE void panic(const char * fmt, ...)
unsigned long caller = (unsigned long) __builtin_return_address(0);
#endif

+#ifdef CONFIG_TREC
+   trec_snapshot();
+#endif
+
/*
 * It's possible to come here directly from a panic-assertion and not
 * have preempt disabled. Some functions called from here want
@@ -96,6 +104,10 @@ NORET_TYPE void panic(const char * fmt, ...)
smp_send_stop();
#endif

+#ifdef CONFIG_TREC
+   trec_print_snapshot();
+#endif
+
atomic_notifier_call_chain(_notifier_list, 0, buf);

if (!panic_blink)
--
1.5.0.rc2
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: MPT Fusion LSI22320 , Domain validation loops .

2007-03-20 Thread Mr. James W. Laferriere
	Hello Eric ,  Fyi ,  linux-2.6.21-rc4 + mpt-fusion(*) patches from 
Andrew Morton's patch tree .  Still gives me the ever looping reset .  But I 
have just found sometrhing of interest one of the Powersuplies in the cabiinet 
'May be' failing .  I have to test that to be satisfied that is the case .
	I'll report back soon on the PS & please look into this .  There is no 
reason for the driver to keep a system in loop over a failing drive set .

Tia ,  JimL

(*)
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc4/2.6.21-rc4-mm1/broken-out/mpt-fusion-handle-pci-layer-error-on-resume.patch
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc4/2.6.21-rc4-mm1/broken-out/mpt-fusion-handle-mpt_resume-failure-while-resuming.patch

On Mon, 19 Mar 2007, Moore, Eric wrote:

On Saturday, March 17, 2007 2:33 PM,  James W. Laferriere wrote:

Hello All ,  I am have been having this problem since I
purchased the
controller and after changing out the disks I thought were
the problem .
I am still getting the continous :

mptscsih: ioc1: attempting task abort! (sc=f7a64500)
scsi 3:0:4:0:
 command: Inquiry: 12 00 00 00 60 00
mptbase: Initiating ioc1 recovery
mptscsih: ioc1: task abort: SUCCESS (sc=f7a64500)
  target3:0:4: Domain Validation detected failure, dropping back
  target3:0:4: Domain Validation skipping write tests
  target3:0:4: Ending Domain Validation
  target3:0:4: asynchronous
  target3:0:5: Beginning Domain Validation
mptscsih: ioc0: attempting target reset! (sc=f7a64380)

The acutual device id's change and the driver
continously resets the
busses & starts all over .

The disks are in a HP DS-SL13R-BA 4354R 14drive ultra3
racKmount cabinet
w/ dualbus & dualps ,  Which seems to present a ID6 ,  That
does not show up in
any of the bus scans .

Now I have previously had the same cabinet with 18gb
disks which had the
same problem with this controller .  BUT I also have a LSI
Logic / Symbios
Logic 53c1010 66MHz Ultra3 dual SCSI bus Adapter which works
flawlessly with the
18gb disks in this very same cabinet .
The cables for connecting the adapter(s) to tha cabinet
are less than 24
inches in length .

Would anyone please shed some light on what it is I am
doing wrong or
need to do or ?  Too have this controller recognise these
disk drives in
this cabinet .


There is a seperate mailing list for scsi releated issues, e.g.
[EMAIL PROTECTED]   I've posted a patch to address your issue several times,
however it seems its not been picked up by the scsi subsystem
maintainer.   The last time it was posted was here:
http://marc.info/?l=linux-scsi=117089244809072=2   An alternative is
you could obtain our latest drivers from the LSI download site, where
these drivers should have this patch
http://www.lsilogic.com/cm/DownloadSearch.do.

Eric




--
+-+
| James   W.   Laferriere | System   Techniques | Give me VMS |
| NetworkEngineer | 663  Beaumont  Blvd |  Give me Linux  |
| [EMAIL PROTECTED] | Pacifica, CA. 94044 |   only  on  AXP |
+-+
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/7] Modifications to drivers/Kconfig and Makefile to configure

2007-03-20 Thread Wink Saville

Signed-off-by: Wink Saville <[EMAIL PROTECTED]>
---
drivers/Kconfig  |2 ++
drivers/Makefile |1 +
2 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/Kconfig b/drivers/Kconfig
index 050323f..f05a2bf 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -84,4 +84,6 @@ source "drivers/auxdisplay/Kconfig"

source "drivers/kvm/Kconfig"

+source "drivers/trec/Kconfig"
+
endmenu
diff --git a/drivers/Makefile b/drivers/Makefile
index 3a718f5..01724c0 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -81,3 +81,4 @@ obj-$(CONFIG_GENERIC_TIME)+= clocksource/
obj-$(CONFIG_DMA_ENGINE)+= dma/
obj-$(CONFIG_HID)   += hid/
obj-$(CONFIG_PPC_PS3)   += ps3/
+obj-$(CONFIG_TREC) += trec/
--
1.5.0.rc2
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/7] Initialize trec early so it may be used early

2007-03-20 Thread Wink Saville

Signed-off-by: Wink Saville <[EMAIL PROTECTED]>
---
init/main.c |4 
1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/init/main.c b/init/main.c
index a92989e..46bc440 100644
--- a/init/main.c
+++ b/init/main.c
@@ -54,6 +54,7 @@
#include 
#include 
#include 
+#include 

#include 
#include 
@@ -517,6 +518,9 @@ asmlinkage void __init start_kernel(void)
early_boot_irqs_off();
early_init_irq_lock_class();

+   trec_init();
+   TREC0();
+
/*
 * Interrupts are still disabled. Do necessary setups, then
 * enable them
--
1.5.0.rc2
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/7] Add trec_snapshot and trec_print_snapshot to die()

2007-03-20 Thread Wink Saville

Signed-off-by: Wink Saville <[EMAIL PROTECTED]>
---
arch/x86_64/kernel/traps.c |   12 
1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/arch/x86_64/kernel/traps.c b/arch/x86_64/kernel/traps.c
index 09d2e8a..c730176 100644
--- a/arch/x86_64/kernel/traps.c
+++ b/arch/x86_64/kernel/traps.c
@@ -33,6 +33,10 @@
#include 
#include 

+#ifdef CONFIG_TREC
+#include 
+#endif
+
#include 
#include 
#include 
@@ -547,9 +551,17 @@ void die(const char * str, struct pt_regs * regs, long err)
{
unsigned long flags = oops_begin();

+#ifdef CONFIG_TREC
+   trec_snapshot();
+#endif
+
if (!user_mode(regs))
report_bug(regs->rip);

+#ifdef CONFIG_TREC
+   trec_print_snapshot();
+#endif
+
__die(str, regs, err);
oops_end(flags);
do_exit(SIGSEGV);
--
1.5.0.rc2
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 6/7] Added trec_snapshot and trec_print_snapshot to do_page_fault() when the kernel itself faults

2007-03-20 Thread Wink Saville

Signed-off-by: Wink Saville <[EMAIL PROTECTED]>
---
arch/x86_64/mm/fault.c |   10 ++
1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/arch/x86_64/mm/fault.c b/arch/x86_64/mm/fault.c
index 6ada723..9857ade 100644
--- a/arch/x86_64/mm/fault.c
+++ b/arch/x86_64/mm/fault.c
@@ -25,6 +25,10 @@
#include 
#include 

+#ifdef CONFIG_TREC
+#include 
+#endif
+
#include 
#include 
#include 
@@ -534,6 +538,9 @@ no_context:
 */

flags = oops_begin();
+#ifdef CONFIG_TREC
+   trec_snapshot();
+#endif

if (address < PAGE_SIZE)
printk(KERN_ALERT "Unable to handle kernel NULL pointer 
dereference");
@@ -548,6 +555,9 @@ no_context:
__die("Oops", regs, error_code);
/* Executive summary in case the body of the oops scrolled away */
printk(KERN_EMERG "CR2: %016lx\n", address);
+#ifdef CONFIG_TREC
+   trec_print_snapshot();
+#endif
oops_end(flags);
do_exit(SIGKILL);

--
1.5.0.rc2
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/7] Initial implementation of the trec driver and include files

2007-03-20 Thread Wink Saville

Signed-off-by: Wink Saville <[EMAIL PROTECTED]>
---
drivers/trec/Kconfig   |   14 ++
drivers/trec/Makefile  |5 +
drivers/trec/trec.c|  328 
include/asm-generic/trec.h |   17 +++
include/asm-i386/trec.h|   33 +
include/asm-x86_64/trec.h  |   13 ++
include/linux/trec.h   |   34 +
7 files changed, 444 insertions(+), 0 deletions(-)
create mode 100644 drivers/trec/Kconfig
create mode 100644 drivers/trec/Makefile
create mode 100644 drivers/trec/trec.c
create mode 100644 include/asm-generic/trec.h
create mode 100644 include/asm-i386/trec.h
create mode 100644 include/asm-x86_64/trec.h
create mode 100644 include/linux/trec.h

diff --git a/drivers/trec/Kconfig b/drivers/trec/Kconfig
new file mode 100644
index 000..ef43f1f
--- /dev/null
+++ b/drivers/trec/Kconfig
@@ -0,0 +1,14 @@
+#
+# Trace record configuration
+#
+
+menu "Trace record support"
+
+config TREC
+   def_bool n
+   bool "Trace record support"
+   ---help---
+ Trace records are a light weight tracing facility
+
+endmenu
+
diff --git a/drivers/trec/Makefile b/drivers/trec/Makefile
new file mode 100644
index 000..d930b4d
--- /dev/null
+++ b/drivers/trec/Makefile
@@ -0,0 +1,5 @@
+#
+# Makefile for Trace records.
+#
+
+obj-$(CONFIG_TREC) += trec.o
diff --git a/drivers/trec/trec.c b/drivers/trec/trec.c
new file mode 100644
index 000..8d954ca
--- /dev/null
+++ b/drivers/trec/trec.c
@@ -0,0 +1,328 @@
+/*
+ * Copyright (C) 2007 Saville Software, Inc.
+ *
+ * This code may be used for any purpose whatsoever, but
+ * no warranty of any kind is provided.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#define TREC_DEBUG
+#ifdef TREC_DEBUG
+#define DPK(fmt, args...) printk(KERN_ERR "trec " fmt, ## args)
+#else
+#define DPK(fmt, args...)
+#endif
+
+struct trec_dev_struct
+{
+   struct  cdevcdev;   /* Character device 
structure */
+};
+
+MODULE_AUTHOR("Wink Saville");
+MODULE_LICENSE("Dual BSD/GPL");
+
+/*
+ * Module parameters
+ */
+int major = 240;   /* 240 a "local/expermental" device number for the 
moment */
+int minor = 1;
+
+module_param(major, int, S_IRUGO);
+module_param(minor, int, S_IRUGO);
+
+/*
+ * Forward declarations
+ */
+static int trec_open(struct inode *inode, struct file *pFile);
+static int trec_release(struct inode *inode, struct file *pFile);
+
+/*
+ * File operations
+ */
+struct file_operations trec_f_ops = {
+   .owner  =   THIS_MODULE,
+   .open   =   trec_open,
+   .release=   trec_release,
+};
+
+struct trec_struct {
+   uint64_ttsc;
+   unsigned long   pc;
+   unsigned long   tsk;
+   unsigned intpid;
+   unsigned long   v1;
+   unsigned long   v2;
+};
+
+/*
+ * Change trec_buffer_struct.data to be a pointer to a PAGE in the future
+ */
+#define TREC_DATA_SIZE 0x200
+struct trec_buffer_struct {
+   struct trec_buffer_struct * pNext;
+   struct trec_struct *pCur;
+   struct trec_struct *pEnd;
+   struct trec_struct  data[TREC_DATA_SIZE];
+};
+
+/*
+ * Number of buffers must be a multiple of two so we can
+ * snapshot the buffers and the minimum should be 4.
+ */
+#defineTREC_COUNT 2
+struct trec_buffer_struct  gTrec_buffers[2][TREC_COUNT];
+intgTrec_idx = 0;
+spinlock_t gTrec_lock = SPIN_LOCK_UNLOCKED;
+
+struct trec_buffer_struct *pTrec_cur = NULL;
+struct trec_buffer_struct *pTrec_snapshot = NULL;
+
+struct trec_dev_struct trec_dev;
+
+/**
+ * Print an address symbol if available to the buffer
+ * this is from traps.c
+ */
+static int snprint_address(char *b, int bsize, unsigned long address)
+{
+#ifdef CONFIG_KALLSYMS
+   unsigned long offset = 0, symsize;
+   const char *symname;
+   char *modname;
+   char *delim = ":";
+   int n;
+   char namebuf[128];
+
+   symname = kallsyms_lookup(address, , , , 
namebuf);
+   if (!symname) {
+   n = 0;
+   } else {
+   if (!modname)
+   modname = delim = ""; 
+   n = snprintf(b, bsize, "0x%016lx %s%s%s%s+0x%lx/0x%lx",
+   address, delim, modname, delim, symname, offset, 
symsize);
+   }
+   return n;
+#else
+   return snprintf(b, bsize, "0x%016lx", address);
+   return 0;
+#endif
+}
+
+/*
+ * Initialize the trec buffers
+ */
+void trec_init(void)
+{
+   int i;
+   int j;
+
+   //DPK("trec: trec_init E\n");
+
+   for (i = 0; i < 2; i++) {
+   for (j = 0; j < TREC_COUNT; j++) {
+   struct trec_buffer_struct *pTrec = _buffers[i][j];
+
+   pTrec->pNext = _buffers[i][(j+1) % TREC_COUNT];
+   pTrec->pCur = >data[0];
+   pTrec->pEnd = 

[PATCH 1/7] Documention for trace record (trec), a light weight tracing mechanism

2007-03-20 Thread Wink Saville

Signed-off-by: Wink Saville <[EMAIL PROTECTED]>
---
Documentation/trec.txt |   87 
1 files changed, 87 insertions(+), 0 deletions(-)
create mode 100644 Documentation/trec.txt

diff --git a/Documentation/trec.txt b/Documentation/trec.txt
new file mode 100644
index 000..e12a552
--- /dev/null
+++ b/Documentation/trec.txt
@@ -0,0 +1,87 @@
+Title  : Trace Records
+Authors: Wink Saville <[EMAIL PROTECTED]>
+
+CONTENTS
+
+1. Concepts
+2. Architectures Supported
+3. Configuring
+4. API Reference
+5. Overhead
+6. TODO
+
+
+1. Concepts
+
+Trace records are a light weight tracing technique that time stamps
+small amounts of information and stores them in a buffer. TREC's are
+light enough that they may be sprinkled most anywhere in the kernel
+and have very little performance impact.
+
+For instance they can be placed in the scheduler and ISR's to watch
+the interaction between ISR's and the scheduler. They can be placed
+in memory handling routines to determine how and when memory is
+allocated and freed.
+
+In the current default configuration the trec's are dumped by calling
+trec_print_snapshot when die() or panic() are called as well as when
+the kernel itself page faults in do_page_fault.
+
+
+2. Architectures Supported
+
+Should support all architectures has been tested only on:
+
+- X86_64
+
+
+3. Configuring
+
+Since trec's are implemented as a device driver they are configured
+by enabling support in the "Device Drivers" section of as they could
+be used early being a module is not supported.
+
+
+4. API Reference
+
+Trec supports the following API:
+
+void trec_init(void):
+
+  Initialize the module, this may be called before the driver is loaded
+  if it is desired to use trec's early.
+
+void trec_write(unsigned long pc, int pid, unsigned long v1, unsigned long v2);
+
+  This is the routine used to write into the buffer. pc is the program counter
+  pid is the process id and v1 and v2 are two parameters.
+
+void trec_snapshot(void);
+
+  Calling this function takes a snapshot of the current trec buffer so that it
+  will not be modified. This is called prior to printing the snapshot via
+  trec_print_snapshot.
+
+void trec_print_snapshot(void);
+
+  Print the snapshot.
+
+In addition a set of macros are defined for convience, they come in
+two flavors, TRECxxx and ZRECxxx. The TRECxxx macros invoke trec_write
+and the ZRECxxx macros do nothing allowing the macros to be quicly
+disabled. Look at include/linux/trec.h for the current set of macros.
+
+
+5. Overhead
+
+Measured on a 2.4GHZ Core 2 Duo the readings between two TREC's is
+270 tics of the rdtsc or about 0.1us. No attempt has been made to
+optimize and less information can be collected if the overhead
+is still to high.
+
+
+6. TODO
+
+a. Add code to dump trec to user space
+b. Enhance to allow runtime registration and runtime enable disable.
+
--
1.5.0.rc2
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/4] i386 GDT cleanups: clean up cpu_init()

2007-03-20 Thread Rusty Russell
We now have cpu_init() and secondary_cpu_init() doing nothing but
calling _cpu_init() with the same arguments.  Rename _cpu_init() to
cpu_init() and use it as a replcement for secondary_cpu_init().

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>

---
 arch/i386/kernel/cpu/common.c |   36 ++--
 arch/i386/kernel/smpboot.c|   10 +-
 include/asm-i386/processor.h  |3 ++-
 3 files changed, 17 insertions(+), 32 deletions(-)

diff -r 18694148c020 arch/i386/kernel/cpu/common.c
--- a/arch/i386/kernel/cpu/common.c Wed Mar 21 17:15:19 2007 +1100
+++ b/arch/i386/kernel/cpu/common.c Wed Mar 21 17:18:37 2007 +1100
@@ -644,9 +644,16 @@ struct i386_pda boot_pda = {
.pcurrent = _task,
 };
 
-/* Common CPU init for both boot and secondary CPUs */
-static void __cpuinit _cpu_init(int cpu, struct task_struct *curr)
-{
+/*
+ * cpu_init() initializes state that is per-CPU. Some data is already
+ * initialized (naturally) in the bootstrap process, such as the GDT
+ * and IDT. We reload them nevertheless, this function acts as a
+ * 'CPU state barrier', nothing should get across.
+ */
+void __cpuinit cpu_init(void)
+{
+   int cpu = smp_processor_id();
+   struct task_struct *curr = current;
struct tss_struct * t = _cpu(init_tss, cpu);
struct thread_struct *thread = >thread;
 
@@ -706,29 +713,6 @@ static void __cpuinit _cpu_init(int cpu,
mxcsr_feature_mask_init();
 }
 
-/* Entrypoint to initialize secondary CPU */
-void __cpuinit secondary_cpu_init(void)
-{
-   int cpu = smp_processor_id();
-   struct task_struct *curr = current;
-
-   _cpu_init(cpu, curr);
-}
-
-/*
- * cpu_init() initializes state that is per-CPU. Some data is already
- * initialized (naturally) in the bootstrap process, such as the GDT
- * and IDT. We reload them nevertheless, this function acts as a
- * 'CPU state barrier', nothing should get across.
- */
-void __cpuinit cpu_init(void)
-{
-   int cpu = smp_processor_id();
-   struct task_struct *curr = current;
-
-   _cpu_init(cpu, curr);
-}
-
 #ifdef CONFIG_HOTPLUG_CPU
 void __cpuinit cpu_uninit(void)
 {
diff -r 18694148c020 arch/i386/kernel/smpboot.c
--- a/arch/i386/kernel/smpboot.cWed Mar 21 17:15:19 2007 +1100
+++ b/arch/i386/kernel/smpboot.cWed Mar 21 17:18:37 2007 +1100
@@ -378,14 +378,14 @@ static void __cpuinit start_secondary(vo
 static void __cpuinit start_secondary(void *unused)
 {
/*
-* Don't put *anything* before secondary_cpu_init(), SMP
-* booting is too fragile that we want to limit the
-* things done here to the most necessary things.
+* Don't put *anything* before cpu_init(), SMP booting is too
+* fragile that we want to limit the things done here to the
+* most necessary things.
 */
 #ifdef CONFIG_VMI
vmi_bringup();
 #endif
-   secondary_cpu_init();
+   cpu_init();
preempt_disable();
smp_callin();
while (!cpu_isset(smp_processor_id(), smp_commenced_mask))
diff -r 18694148c020 include/asm-i386/processor.h
--- a/include/asm-i386/processor.h  Wed Mar 21 17:15:19 2007 +1100
+++ b/include/asm-i386/processor.h  Wed Mar 21 17:18:37 2007 +1100
@@ -744,6 +744,6 @@ extern int sysenter_setup(void);
 extern int sysenter_setup(void);
 
 extern void cpu_set_gdt(int);
-extern void secondary_cpu_init(void);
+extern void cpu_init(void);
 
 #endif /* __ASM_I386_PROCESSOR_H */


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/4] i386 GDT cleanups: Use per-cpu GDT immediately upon boot

2007-03-20 Thread Rusty Russell
Now we are no longer dynamically allocating the GDT, we don't need the
"cpu_gdt_table" at all: we can switch straight from "boot_gdt_table"
to the per-cpu GDT.  This means initializing the cpu_gdt array in C.

The boot CPU uses the per-cpu var directly, then in smp_prepare_cpus()
it switches to the per-cpu copy just allocated.  For secondary CPUs,
the early_gdt_descr is set to point directly to their per-cpu copy.

For UP the code is very simple: it keeps using the "per-cpu" GDT as
per SMP, but we never have to move.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
---
 arch/i386/kernel/cpu/common.c|   74 --
 arch/i386/kernel/head.S  |   65 -
 arch/i386/kernel/smpboot.c   |   59 ++-
 arch/i386/mach-voyager/voyager_smp.c |6 --
 include/asm-i386/desc.h  |2
 include/asm-i386/processor.h |1
 6 files changed, 77 insertions(+), 130 deletions(-)

diff -r a8a4e2f9da08 arch/i386/kernel/cpu/common.c
--- a/arch/i386/kernel/cpu/common.c Wed Mar 21 15:20:48 2007 +1100
+++ b/arch/i386/kernel/cpu/common.c Wed Mar 21 15:31:49 2007 +1100
@@ -25,7 +25,33 @@ DEFINE_PER_CPU(struct Xgt_desc_struct, c
 DEFINE_PER_CPU(struct Xgt_desc_struct, cpu_gdt_descr);
 EXPORT_PER_CPU_SYMBOL(cpu_gdt_descr);
 
-DEFINE_PER_CPU(struct desc_struct, cpu_gdt[GDT_ENTRIES]);
+DEFINE_PER_CPU(struct desc_struct, cpu_gdt[GDT_ENTRIES]) = {
+   [GDT_ENTRY_KERNEL_CS] = { 0x, 0x00cf9a00 },
+   [GDT_ENTRY_KERNEL_DS] = { 0x, 0x00cf9200 },
+   [GDT_ENTRY_DEFAULT_USER_CS] = { 0x, 0x00cffa00 },
+   [GDT_ENTRY_DEFAULT_USER_DS] = { 0x, 0x00cff200 },
+   /*
+* Segments used for calling PnP BIOS have byte granularity.
+* They code segments and data segments have fixed 64k limits,
+* the transfer segment sizes are set at run time.
+*/
+   [GDT_ENTRY_PNPBIOS_CS32] = { 0x, 0x00409a00 },/* 32-bit code */
+   [GDT_ENTRY_PNPBIOS_CS16] = { 0x, 0x9a00 },/* 16-bit code */
+   [GDT_ENTRY_PNPBIOS_DS] = { 0x, 0x9200 }, /* 16-bit data */
+   [GDT_ENTRY_PNPBIOS_TS1] = { 0x, 0x9200 },/* 16-bit data */
+   [GDT_ENTRY_PNPBIOS_TS2] = { 0x, 0x9200 },/* 16-bit data */
+   /*
+* The APM segments have byte granularity and their bases
+* are set at run time.  All have 64k limits.
+*/
+   [GDT_ENTRY_APMBIOS_BASE] = { 0x, 0x00409a00 },/* 32-bit code */
+   /* 16-bit code */
+   [GDT_ENTRY_APMBIOS_BASE+1] = { 0x, 0x9a00 },
+   [GDT_ENTRY_APMBIOS_BASE+2] = { 0x, 0x00409200 }, /* data */
+
+   [GDT_ENTRY_ESPFIX_SS] = { 0x, 0x00c09200 },
+   [GDT_ENTRY_PDA] = { 0x, 0x00c09200 }, /* set in setup_pda */
+};
 
 DEFINE_PER_CPU(struct i386_pda, _cpu_pda);
 EXPORT_PER_CPU_SYMBOL(_cpu_pda);
@@ -618,46 +644,6 @@ struct i386_pda boot_pda = {
.pcurrent = _task,
 };
 
-static inline void set_kernel_fs(void)
-{
-   /* Set %fs for this CPU's PDA.  Memory clobber is to create a
-  barrier with respect to any PDA operations, so the compiler
-  doesn't move any before here. */
-   asm volatile ("mov %0, %%fs" : : "r" (__KERNEL_PDA) : "memory");
-}
-
-/* Initialize the CPU's GDT and PDA.  This is either the boot CPU doing itself
-   (still using cpu_gdt_table), or a CPU doing it for a secondary which
-   will soon come up. */
-__cpuinit void init_gdt(int cpu, struct task_struct *idle)
-{
-   struct Xgt_desc_struct *cpu_gdt_descr = _cpu(cpu_gdt_descr, cpu);
-   struct desc_struct *gdt = per_cpu(cpu_gdt, cpu);
-   struct i386_pda *pda = _cpu(_cpu_pda, cpu);
-
-   memcpy(gdt, cpu_gdt_table, GDT_SIZE);
-   cpu_gdt_descr->address = (unsigned long)gdt;
-   cpu_gdt_descr->size = GDT_SIZE - 1;
-
-   pack_descriptor((u32 *)[GDT_ENTRY_PDA].a,
-   (u32 *)[GDT_ENTRY_PDA].b,
-   (unsigned long)pda, sizeof(*pda) - 1,
-   0x80 | DESCTYPE_S | 0x2, 0); /* present read-write data 
segment */
-
-   memset(pda, 0, sizeof(*pda));
-   pda->_pda = pda;
-   pda->cpu_number = cpu;
-   pda->pcurrent = idle;
-}
-
-void __cpuinit cpu_set_gdt(int cpu)
-{
-   struct Xgt_desc_struct *cpu_gdt_descr = _cpu(cpu_gdt_descr, cpu);
-
-   load_gdt(cpu_gdt_descr);
-   set_kernel_fs();
-}
-
 /* Common CPU init for both boot and secondary CPUs */
 static void __cpuinit _cpu_init(int cpu, struct task_struct *curr)
 {
@@ -740,10 +726,6 @@ void __cpuinit cpu_init(void)
int cpu = smp_processor_id();
struct task_struct *curr = current;
 
-   /* Set up the real GDT and PDA, so we can transition from the
-  boot_gdt_table & boot_pda. */
-   init_gdt(cpu, curr);
-   cpu_set_gdt(cpu);
_cpu_init(cpu, curr);
 }
 
diff -r a8a4e2f9da08 arch/i386/kernel/head.S
--- 

Re: [PATCH] Turn do_sync_file_range() into do_sync_mapping_range()

2007-03-20 Thread Andrew Morton
On Tue, 20 Mar 2007 14:46:26 -0700 Mark Fasheh <[EMAIL PROTECTED]> wrote:

> do_sync_file_range() accepts a file * from which it takes an address_space
> to sync. Abstract out the bulk of the function into do_sync_mapping_range()
> which takes the address_space directly. This way callers who want to sync an
> address_space directly can take advantage of the functionality provided.
> 
> do_sync_file_range() is preserved as a 3 line wrapper around
> do_sync_mapping_range().
> 
> Ocfs2 in particular would like to use this to initiate a sync of a specific
> inode range during truncate, where a file * may not be available.

I think we can remove do_sync_file_range() altogether?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/4] i386 GDT cleanups: Use per-cpu variables for GDT, PDA

2007-03-20 Thread Rusty Russell
Allocating PDA and GDT at boot is a pain.  Using simple per-cpu
variables adds happiness (although we need the GDT page-aligned for
Xen, which we do in a followup patch).

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>

---
 arch/i386/kernel/cpu/common.c|   96 +--
 arch/i386/kernel/smpboot.c   |   21 ---
 arch/i386/mach-voyager/voyager_smp.c |   10 ---
 include/asm-i386/desc.h  |1 
 include/asm-i386/pda.h   |7 +-
 include/asm-i386/processor.h |2 
 6 files changed, 21 insertions(+), 116 deletions(-)

diff -r d9b1ba2049f8 arch/i386/kernel/cpu/common.c
--- a/arch/i386/kernel/cpu/common.c Wed Mar 21 10:14:02 2007 +1100
+++ b/arch/i386/kernel/cpu/common.c Wed Mar 21 14:21:43 2007 +1100
@@ -25,8 +25,10 @@ DEFINE_PER_CPU(struct Xgt_desc_struct, c
 DEFINE_PER_CPU(struct Xgt_desc_struct, cpu_gdt_descr);
 EXPORT_PER_CPU_SYMBOL(cpu_gdt_descr);
 
-struct i386_pda *_cpu_pda[NR_CPUS] __read_mostly;
-EXPORT_SYMBOL(_cpu_pda);
+DEFINE_PER_CPU(struct desc_struct, cpu_gdt[GDT_ENTRIES]);
+
+DEFINE_PER_CPU(struct i386_pda, _cpu_pda);
+EXPORT_PER_CPU_SYMBOL(_cpu_pda);
 
 static int cachesize_override __cpuinitdata = -1;
 static int disable_x86_fxsr __cpuinitdata;
@@ -609,52 +611,6 @@ struct pt_regs * __devinit idle_regs(str
return regs;
 }
 
-static __cpuinit int alloc_gdt(int cpu)
-{
-   struct Xgt_desc_struct *cpu_gdt_descr = _cpu(cpu_gdt_descr, cpu);
-   struct desc_struct *gdt;
-   struct i386_pda *pda;
-
-   gdt = (struct desc_struct *)cpu_gdt_descr->address;
-   pda = cpu_pda(cpu);
-
-   /*
-* This is a horrible hack to allocate the GDT.  The problem
-* is that cpu_init() is called really early for the boot CPU
-* (and hence needs bootmem) but much later for the secondary
-* CPUs, when bootmem will have gone away
-*/
-   if (NODE_DATA(0)->bdata->node_bootmem_map) {
-   BUG_ON(gdt != NULL || pda != NULL);
-
-   gdt = alloc_bootmem_pages(PAGE_SIZE);
-   pda = alloc_bootmem(sizeof(*pda));
-   /* alloc_bootmem(_pages) panics on failure, so no check */
-
-   memset(gdt, 0, PAGE_SIZE);
-   memset(pda, 0, sizeof(*pda));
-   } else {
-   /* GDT and PDA might already have been allocated if
-  this is a CPU hotplug re-insertion. */
-   if (gdt == NULL)
-   gdt = (struct desc_struct *)get_zeroed_page(GFP_KERNEL);
-
-   if (pda == NULL)
-   pda = kmalloc_node(sizeof(*pda), GFP_KERNEL, 
cpu_to_node(cpu));
-
-   if (unlikely(!gdt || !pda)) {
-   free_pages((unsigned long)gdt, 0);
-   kfree(pda);
-   return 0;
-   }
-   }
-
-   cpu_gdt_descr->address = (unsigned long)gdt;
-   cpu_pda(cpu) = pda;
-
-   return 1;
-}
-
 /* Initial PDA used by boot CPU */
 struct i386_pda boot_pda = {
._pda = _pda,
@@ -670,31 +626,17 @@ static inline void set_kernel_fs(void)
asm volatile ("mov %0, %%fs" : : "r" (__KERNEL_PDA) : "memory");
 }
 
-/* Initialize the CPU's GDT and PDA.  The boot CPU does this for
-   itself, but secondaries find this done for them. */
-__cpuinit int init_gdt(int cpu, struct task_struct *idle)
+/* Initialize the CPU's GDT and PDA.  This is either the boot CPU doing itself
+   (still using cpu_gdt_table), or a CPU doing it for a secondary which
+   will soon come up. */
+__cpuinit void init_gdt(int cpu, struct task_struct *idle)
 {
struct Xgt_desc_struct *cpu_gdt_descr = _cpu(cpu_gdt_descr, cpu);
-   struct desc_struct *gdt;
-   struct i386_pda *pda;
-
-   /* For non-boot CPUs, the GDT and PDA should already have been
-  allocated. */
-   if (!alloc_gdt(cpu)) {
-   printk(KERN_CRIT "CPU%d failed to allocate GDT or PDA\n", cpu);
-   return 0;
-   }
-
-   gdt = (struct desc_struct *)cpu_gdt_descr->address;
-   pda = cpu_pda(cpu);
-
-   BUG_ON(gdt == NULL || pda == NULL);
-
-   /*
-* Initialize the per-CPU GDT with the boot GDT,
-* and set up the GDT descriptor:
-*/
+   struct desc_struct *gdt = per_cpu(cpu_gdt, cpu);
+   struct i386_pda *pda = _cpu(_cpu_pda, cpu);
+
memcpy(gdt, cpu_gdt_table, GDT_SIZE);
+   cpu_gdt_descr->address = (unsigned long)gdt;
cpu_gdt_descr->size = GDT_SIZE - 1;
 
pack_descriptor((u32 *)[GDT_ENTRY_PDA].a,
@@ -706,17 +648,12 @@ __cpuinit int init_gdt(int cpu, struct t
pda->_pda = pda;
pda->cpu_number = cpu;
pda->pcurrent = idle;
-
-   return 1;
 }
 
 void __cpuinit cpu_set_gdt(int cpu)
 {
struct Xgt_desc_struct *cpu_gdt_descr = _cpu(cpu_gdt_descr, cpu);
 
-   /* Reinit these anyway, even if they've already been done (on
-  the boot CPU, this will transition from the boot 

Re: [PATCH] swsusp: Fix SNAPSHOT_S2RAM ioctl

2007-03-20 Thread Andrew Morton
On Tue, 20 Mar 2007 22:48:08 +0100 "Rafael J. Wysocki" <[EMAIL PROTECTED]> 
wrote:

> From: Rafael J. Wysocki <[EMAIL PROTECTED]>
> 
> The SNAPSHOT_S2RAM ioctl does not disable the nonboot CPUs before entering
> the suspend, although it should do this.
> 
> Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
> ---
>  kernel/power/user.c |9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> Index: linux-2.6.21-rc4/kernel/power/user.c
> ===
> --- linux-2.6.21-rc4.orig/kernel/power/user.c
> +++ linux-2.6.21-rc4/kernel/power/user.c
> @@ -374,9 +374,12 @@ static int snapshot_ioctl(struct inode *
>   if (error) {
>   printk(KERN_ERR "Failed to suspend some devices.\n");
>   } else {
> - /* Enter S3, system is already frozen */
> - suspend_enter(PM_SUSPEND_MEM);
> -
> + error = disable_nonboot_cpus();
> + if (!error) {
> + /* Enter S3, system is already frozen */
> + suspend_enter(PM_SUSPEND_MEM);
> + enable_nonboot_cpus();
> + }
>   /* Wake up devices */
>   device_resume();
>   }

Do you consider this appropriate to 2.6.21?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/4] i386 GDT cleanups

2007-03-20 Thread Rusty Russell
After lots of good feedback and contributions from the last series, this
set of 4 simply cleans up GDT usage in i386.  Percpu->pda is not
included: it's really a separate problem (but made much simpler by these
patches).

Patches are:
no-gdt-pda-alloc.patch 
- Simplify by using per-cpu vars for gdt & pda, not allocating.
This patch has been seen here before, and Jeremy Fitzhardinge
acked it.

direct-percpu-gdt.patch 
- Simplify boot by switching straight from boot_gdt_table
straight to per-cpu versions, rather than going to cpu_gdt_table
then per-cpu gdt.  This is a new approach after Ingo cautioned
about removing boot_gdt_table.

cleanup-cpuinits.patch 
- Simple patch: we can now roll two identical functions together

cleanup-gdt-accessors.patch 
- Remove a level of indirection

Cheers,
Rusty.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] SLAB : NUMA cache_free_alien() very expensive because of virt_to_slab(objp); nodeid = slabp->nodeid;

2007-03-20 Thread Eric Dumazet

Christoph Lameter a écrit :

On Tue, 20 Mar 2007, Eric Dumazet wrote:


I understand we want to do special things (fallback and such tricks) at
allocation time, but I believe that we can just trust the real nid of memory
at free time.


Sorry no. The node at allocation time determines which node specific 
structure tracks the slab. If we fall back then the node is allocated 
from one node but entered in the node structure of another. Thus you 
cannot free the slab without knowing the node at allocation time.


I think you dont understand my point.

When we enter kmem_cache_free(), we are not freeing a slab, but an object, 
knowing a pointer to this object.


The fast path is to put the pointer, into the cpu array cache. This object 
might be given back some cycles later, because of a kmem_cache_alloc() : No 
need to access the two cache lines (struct page, struct slab)


This fast path could be entered checking the node of the page, which is 
faster, but eventually different from the virt_to_slab(obj)->nodeid. Do we 
care ? Definitly not : Node is guaranted to be correct.


Then, if we must flush the cpu array cache because it is full, we *may* access 
the slabs of the objects we are flushing, and then check the 
virt_to_slab(obj)->nodeid to be able to do the correct thing.


Fortunatly, flushing cache array is not a frequent event, and the cost of 
access to cache lines (truct page, struct slab) can be amortized because 
several 'transfered or freed' objects might share them at this time.



Actually I had to disable NUMA on my platforms because it is just overkill and 
slower. Maybe its something OK for very big machines, and not dual nodes 
Opterons ? Let me know so that I dont waste your time (and mine)



Thank you
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Allow per-cpu variables to be page-aligned

2007-03-20 Thread Rusty Russell
[This was part of the GDT cleanups and per-cpu-> pda changes, which I
have revised, but this stands on its own.  The only change is catching
the x86-64 per-cpu allocator too].
==
Let's allow page-alignment in general for per-cpu data (wanted by Xen,
and Ingo suggested KVM as well).

Because larger alignments can use more room, we increase the max
per-cpu memory to 64k rather than 32k: it's getting a little tight.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
Acked-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 arch/alpha/kernel/vmlinux.lds.S   |2 +-
 arch/arm/kernel/vmlinux.lds.S |2 +-
 arch/cris/arch-v32/vmlinux.lds.S  |1 +
 arch/frv/kernel/vmlinux.lds.S |1 +
 arch/i386/kernel/vmlinux.lds.S|2 +-
 arch/m32r/kernel/vmlinux.lds.S|2 +-
 arch/mips/kernel/vmlinux.lds.S|2 +-
 arch/parisc/kernel/vmlinux.lds.S  |2 +-
 arch/powerpc/kernel/setup_64.c|4 ++--
 arch/powerpc/kernel/vmlinux.lds.S |6 +-
 arch/ppc/kernel/vmlinux.lds.S |2 +-
 arch/s390/kernel/vmlinux.lds.S|2 +-
 arch/sh/kernel/vmlinux.lds.S  |2 +-
 arch/sh64/kernel/vmlinux.lds.S|2 +-
 arch/sparc/kernel/vmlinux.lds.S   |2 +-
 arch/sparc64/kernel/smp.c |6 +++---
 arch/x86_64/kernel/setup64.c  |4 ++--
 arch/x86_64/kernel/vmlinux.lds.S  |2 +-
 arch/xtensa/kernel/vmlinux.lds.S  |2 +-
 include/linux/percpu.h|2 +-
 init/main.c   |4 ++--
 kernel/module.c   |   10 +-
 22 files changed, 31 insertions(+), 33 deletions(-)

===
--- a/arch/alpha/kernel/vmlinux.lds.S
+++ b/arch/alpha/kernel/vmlinux.lds.S
@@ -69,7 +69,7 @@ SECTIONS
   . = ALIGN(8);
   SECURITY_INIT
 
-  . = ALIGN(64);
+  . = ALIGN(8192);
   __per_cpu_start = .;
   .data.percpu : { *(.data.percpu) }
   __per_cpu_end = .;
===
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -59,7 +59,7 @@ SECTIONS
usr/built-in.o(.init.ramfs)
__initramfs_end = .;
 #endif
-   . = ALIGN(64);
+   . = ALIGN(4096);
__per_cpu_start = .;
*(.data.percpu)
__per_cpu_end = .;
===
--- a/arch/cris/arch-v32/vmlinux.lds.S
+++ b/arch/cris/arch-v32/vmlinux.lds.S
@@ -91,6 +91,7 @@ SECTIONS
}
SECURITY_INIT
 
+   . =  ALIGN (8192);
__per_cpu_start = .;
.data.percpu  : { *(.data.percpu) }
__per_cpu_end = .;
===
--- a/arch/frv/kernel/vmlinux.lds.S
+++ b/arch/frv/kernel/vmlinux.lds.S
@@ -57,6 +57,7 @@ SECTIONS
   __alt_instructions_end = .;
  .altinstr_replacement : { *(.altinstr_replacement) }
 
+  . = ALIGN(4096);
   __per_cpu_start = .;
   .data.percpu  : { *(.data.percpu) }
   __per_cpu_end = .;
===
--- a/arch/i386/kernel/vmlinux.lds.S
+++ b/arch/i386/kernel/vmlinux.lds.S
@@ -195,7 +195,7 @@ SECTIONS
__initramfs_end = .;
   }
 #endif
-  . = ALIGN(L1_CACHE_BYTES);
+  . = ALIGN(4096);
   .data.percpu  : AT(ADDR(.data.percpu) - LOAD_OFFSET) {
__per_cpu_start = .;
*(.data.percpu)
===
--- a/arch/m32r/kernel/vmlinux.lds.S
+++ b/arch/m32r/kernel/vmlinux.lds.S
@@ -110,7 +110,7 @@ SECTIONS
   __initramfs_end = .;
 #endif
 
-  . = ALIGN(32);
+  . = ALIGN(4096);
   __per_cpu_start = .;
   .data.percpu  : { *(.data.percpu) }
   __per_cpu_end = .;
===
--- a/arch/mips/kernel/vmlinux.lds.S
+++ b/arch/mips/kernel/vmlinux.lds.S
@@ -119,7 +119,7 @@ SECTIONS
   .init.ramfs : { *(.init.ramfs) }
   __initramfs_end = .;
 #endif
-  . = ALIGN(32);
+  . = ALIGN(_PAGE_SIZE);
   __per_cpu_start = .;
   .data.percpu  : { *(.data.percpu) }
   __per_cpu_end = .;
===
--- a/arch/parisc/kernel/vmlinux.lds.S
+++ b/arch/parisc/kernel/vmlinux.lds.S
@@ -181,7 +181,7 @@ SECTIONS
   .init.ramfs : { *(.init.ramfs) }
   __initramfs_end = .;
 #endif
-  . = ALIGN(32);
+  . = ALIGN(ASM_PAGE_SIZE);
   __per_cpu_start = .;
   .data.percpu  : { *(.data.percpu) }
   __per_cpu_end = .;
===
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -583,14 +583,14 @@ void __init setup_per_cpu_areas(void)
char *ptr;
 
/* Copy section for each CPU (we discard the original) */
-   size = ALIGN(__per_cpu_end - __per_cpu_start, SMP_CACHE_BYTES);
+   size = ALIGN(__per_cpu_end - __per_cpu_start, PAGE_SIZE);
 #ifdef CONFIG_MODULES
if (size < PERCPU_ENOUGH_ROOM)
size = 

Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Andrew Morton
On Tue, 20 Mar 2007 21:23:52 +0100 Andi Kleen <[EMAIL PROTECTED]> wrote:

> Well it causes additional problems. We had some cases where it was really
> hard to distingush garbage and the true call chain.

yes, for some reason the naive backtraces seem to have got messier and messier 
over
the years and some of them are really quite hard to piece together nowadays.

An accurate backtrace would have some value, if we can get it bullet-proof.

The fault-injection driver wants it too.  And lockdep, I guess.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc4-mm1

2007-03-20 Thread Andrew Morton
On Tue, 20 Mar 2007 12:20:16 -0700 Kees Cook <[EMAIL PROTECTED]> wrote:

> I can't 
> get 2.6.21-rc4-mm1 to compile (with or without this fix):
> 
>   GEN .version
> init/.missing_syscalls.h.cmd:2: *** missing separator.  Stop.
> make: *** [.tmp_vmlinux1] Error 2

How'd you manage that?

Sam, I think this is a you-thing rather than a dwmw2-thing?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.

2007-03-20 Thread William Lee Irwin III
William Lee Irwin III wrote:
>> ISTR potential ppc64 users coming out of the woodwork for something I
>> didn't recognize the name of, but I may be confusing that with your
>> patch. I can implement additional users (and useful ones at that)
>> needing this in particular if desired.

On Wed, Mar 21, 2007 at 04:07:43PM +1100, Nick Piggin wrote:
> Yes I would be interested in seeing useful additional users of this
> that cannot use our regular virtual memory, before making it a general
> thing.
> I just don't want to see proliferation of these things, if possible.

I'm tied up elsewhere so I won't get to it in a timely fashion. Maybe
in a few weeks I can start up on the first two of the bunch.


William Lee Irwin III wrote:
>> Two fault handling methods callbacks raise an eyebrow over here at least.
>> I was vaguely hoping for unification of the fault handling callbacks.

On Wed, Mar 21, 2007 at 04:07:43PM +1100, Nick Piggin wrote:
> I don't know if it would be so clean to do that as they are at different 
> levels.
> Adam's fault is before the VM translation (and bypasses it), and mine is 
> after.

Not much of a VM translation; it's just a lookup through the software
mocked-up structures on everything save i386, x86_64, and some m68k where
they're the same thing only with hardware walkers (ISTR ia64's being
firmware a la Alpha despite the "HPW" name, though I could be wrong)
reliant on them. The drivers/etc. could just as easily use helper
functions to carry out the lookup, thereby accomplishing the
unification. There's nothing particularly fundamental about a pte
lookup. Normal arches that do software TLB refill could just as easily
consult the radix trees dangled off struct address_space or any old
data structure floating around the kernel with enough information to
translate user virtual addresses to the physical addresses they need to
fill the TLB with, and there are other kernels that literally do things
like that.

Basically, drop in to the ->fault() callback with no attempt at a pte
lookup. The drivers using the standard pagetable format can call helper
functions to do all the gruntwork surrounding that for them. Then the
more sophisticated drivers can do the necessary work by hand.

But others should really be consulted on this point. My notions in/around
this area tend to be outside the mainstream. I can anticipate that the
two ->fault() functions will look strange to people, but not what
alternatives would be most idiomatic to mainstream Linux conventions.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.

2007-03-20 Thread Nick Piggin

William Lee Irwin III wrote:

Adam Litke wrote:


struct vm_operations_struct * vm_ops;
+   const struct pagetable_operations_struct * pagetable_ops;



On Wed, Mar 21, 2007 at 03:18:30PM +1100, Nick Piggin wrote:


Can you remind me why this isn't in vm_ops?
Also, it is going to be hugepage-only, isn't it? So should the naming be
changed to reflect that? And #ifdef it...



ISTR potential ppc64 users coming out of the woodwork for something I
didn't recognize the name of, but I may be confusing that with your
patch. I can implement additional users (and useful ones at that)
needing this in particular if desired.


Yes I would be interested in seeing useful additional users of this
that cannot use our regular virtual memory, before making it a general
thing.

I just don't want to see proliferation of these things, if possible.


Adam Litke wrote:


+struct pagetable_operations_struct {
+   int (*fault)(struct mm_struct *mm,



On Wed, Mar 21, 2007 at 03:18:30PM +1100, Nick Piggin wrote:


I got dibs on fault ;)
My callback is a sanitised one that basically abstracts the details of the
virtual memory mapping away, so it is usable by drivers and filesystems.
You actually want to bypass the normal fault handling because it doesn't
know how to deal with your virtual memory mapping. Hmm, the best suggestion
I can come up with is handle_mm_fault... unless you can think of a better
name for me to use.



Two fault handling methods callbacks raise an eyebrow over here at least.
I was vaguely hoping for unification of the fault handling callbacks.


I don't know if it would be so clean to do that as they are at different levels.
Adam's fault is before the VM translation (and bypasses it), and mine is after.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


AIO, FIO and Threads ...

2007-03-20 Thread Davide Libenzi

I was looking at Jens FIO stuff, and I decided to cook a quick patch for 
FIO to support GUASI (Generic Userspace Asyncronous Syscall Interface):

http://www.xmailserver.org/guasi-lib.html

I then ran a few tests on my Dual Opteron 252 with SATA drives (sata_nv) 
and 8GB of RAM.
Mind that I'm not FIO expert, like at all, but I got some interesting 
results when comparing GUASI with libaio at 8/1000/1 depths.
If I read those result correctly (Jens may help), GUASI output is more 
then double the libaio one.
Lots of context switches, yes. But the throughput looks like 2+ times.
Can someone try to repeat the measures and/or spot the error?
Or tell me which other tests to run?
This is kinda a suprise for me ...



PS: FIO patch to support GUASI is attached. You also need to fetch GUASI 
and (configure && make install)



- Davide



>> fio --name=global --rw=randread --size=64m --ioengine=guasi --name=job1 
>> --iodepth=8 --thread

job1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=guasi, iodepth=8
Starting 1 thread
Jobs: 1: [r] [100.0% done] [  3135/ 0 kb/s] [eta 00m:00s]
job1: (groupid=0, jobs=1): err= 0: pid=29298
  read : io=65,536KiB, bw=1,576KiB/s, iops=384, runt= 42557msec
slat (msec): min=0, max=0, avg= 0.00, stdev= 0.00
clat (msec): min=0, max=  212, avg=20.26, stdev=18.83
bw (KiB/s) : min= 1166, max= 3376, per=98.51%, avg=1552.50, stdev=317.42
  cpu  : usr=7.69%, sys=92.99%, ctx=97648
  IO depths: 1=0.0%, 2=0.0%, 4=0.1%, 8=99.9%, 16=0.0%, 32=0.0%, >=64=0.0%
 lat (msec): 2=1.4%, 4=3.6%, 10=25.3%, 20=34.0%, 50=28.1%, 100=6.8%
 lat (msec): 250=0.8%, 500=0.0%, 750=0.0%, 1000=0.0%, >=2000=0.0%

Run status group 0 (all jobs):
   READ: io=65,536KiB, aggrb=1,576KiB/s, minb=1,576KiB/s, maxb=1,576KiB/s, 
mint=42557msec, maxt=42557msec

Disk stats (read/write):
  sda: ios=16376/98, merge=8/135, ticks=339481/2810, in_queue=342290, 
util=99.17%


>> fio --name=global --rw=randread --size=64m --ioengine=libaio --name=job1 
>> --iodepth=8 --thread

job1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=8
Starting 1 thread
Jobs: 1: [r] [95.9% done] [  2423/ 0 kb/s] [eta 00m:03s]
job1: (groupid=0, jobs=1): err= 0: pid=29332
  read : io=65,536KiB, bw=929KiB/s, iops=226, runt= 72181msec
slat (msec): min=0, max=   98, avg=31.30, stdev=15.53
clat (msec): min=0, max=0, avg= 0.00, stdev= 0.00
bw (KiB/s) : min=  592, max= 2835, per=98.56%, avg=915.58, stdev=325.29
  cpu  : usr=0.02%, sys=0.34%, ctx=23023
  IO depths: 1=22.2%, 2=22.2%, 4=44.4%, 8=11.1%, 16=0.0%, 32=0.0%, >=64=0.0%
 lat (msec): 2=100.0%, 4=0.0%, 10=0.0%, 20=0.0%, 50=0.0%, 100=0.0%
 lat (msec): 250=0.0%, 500=0.0%, 750=0.0%, 1000=0.0%, >=2000=0.0%

Run status group 0 (all jobs):
   READ: io=65,536KiB, aggrb=929KiB/s, minb=929KiB/s, maxb=929KiB/s, 
mint=72181msec, maxt=72181msec

Disk stats (read/write):
  sda: ios=16384/43, merge=0/42, ticks=71889/20573, in_queue=92461, util=99.57%


>> fio --name=global --rw=randread --size=64m --ioengine=guasi --name=job1 
>> --iodepth=1000 --thread

job1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=guasi, iodepth=1000
Starting 1 thread
Jobs: 1: [r] [93.9% done] [   815/ 0 kb/s] [eta 00m:02s]
job1: (groupid=0, jobs=1): err= 0: pid=29343
  read : io=65,536KiB, bw=2,130KiB/s, iops=520, runt= 31500msec
slat (msec): min=0, max=   26, avg= 1.02, stdev= 4.19
clat (msec): min=   12, max=28024, avg=1920.73, stdev=764.20
bw (KiB/s) : min= 1139, max= 3376, per=95.21%, avg=2027.87, stdev=354.38
  cpu  : usr=7.35%, sys=93.77%, ctx=104637
  IO depths: 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.1%, 32=0.2%, >=64=99.6%
 lat (msec): 2=0.0%, 4=0.0%, 10=0.0%, 20=0.0%, 50=0.1%, 100=0.4%
 lat (msec): 250=1.2%, 500=1.0%, 750=0.8%, 1000=0.7%, >=2000=45.5%

Run status group 0 (all jobs):
   READ: io=65,536KiB, aggrb=2,130KiB/s, minb=2,130KiB/s, maxb=2,130KiB/s, 
mint=31500msec, maxt=31500msec

Disk stats (read/write):
  sda: ios=16267/31, merge=115/28, ticks=4019824/313471, in_queue=4333625, 
util=98.84%


>> fio --name=global --rw=randread --size=64m --ioengine=libaio --name=job1 
>> --iodepth=1000 --thread

job1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=1000
Starting 1 thread
Jobs: 1: [r] [98.6% done] [  4083/ 0 kb/s] [eta 00m:01s]]
job1: (groupid=0, jobs=1): err= 0: pid=30346
  read : io=65,536KiB, bw=920KiB/s, iops=224, runt= 72925msec
slat (msec): min=0, max= 5539, avg=4431.27, stdev=1268.03
clat (msec): min=0, max=0, avg= 0.00, stdev= 0.00
bw (KiB/s) : min=0, max= 2361, per=103.56%, avg=952.75, stdev=499.54
  cpu  : usr=0.02%, sys=0.39%, ctx=23089
  IO depths: 1=0.2%, 2=0.2%, 4=0.4%, 8=0.8%, 16=1.7%, 32=3.3%, >=64=93.4%
 lat (msec): 2=100.0%, 4=0.0%, 10=0.0%, 20=0.0%, 50=0.0%, 100=0.0%
 lat (msec): 250=0.0%, 500=0.0%, 750=0.0%, 1000=0.0%, >=2000=0.0%

Run status group 0 (all jobs):
   READ: io=65,536KiB, aggrb=920KiB/s, 

Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.

2007-03-20 Thread William Lee Irwin III
Adam Litke wrote:
>>  struct vm_operations_struct * vm_ops;
>> +const struct pagetable_operations_struct * pagetable_ops;

On Wed, Mar 21, 2007 at 03:18:30PM +1100, Nick Piggin wrote:
> Can you remind me why this isn't in vm_ops?
> Also, it is going to be hugepage-only, isn't it? So should the naming be
> changed to reflect that? And #ifdef it...

ISTR potential ppc64 users coming out of the woodwork for something I
didn't recognize the name of, but I may be confusing that with your
patch. I can implement additional users (and useful ones at that)
needing this in particular if desired.


Adam Litke wrote:
>> +struct pagetable_operations_struct {
>> +int (*fault)(struct mm_struct *mm,

On Wed, Mar 21, 2007 at 03:18:30PM +1100, Nick Piggin wrote:
> I got dibs on fault ;)
> My callback is a sanitised one that basically abstracts the details of the
> virtual memory mapping away, so it is usable by drivers and filesystems.
> You actually want to bypass the normal fault handling because it doesn't
> know how to deal with your virtual memory mapping. Hmm, the best suggestion
> I can come up with is handle_mm_fault... unless you can think of a better
> name for me to use.

Two fault handling methods callbacks raise an eyebrow over here at least.
I was vaguely hoping for unification of the fault handling callbacks.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] Apple SMC driver (hardware monitoring and control)

2007-03-20 Thread Bob Copeland
On Tue, Mar 20, 2007 at 03:02:14PM +0800, Nicolas Boichat wrote:
> I tried neverball on my Macbook Pro 1st generation (Core Duo, not Core 2
> Duo), and the x axis in inverted, not the y axis.
> 
> Could you confirm which axis is inverted on your Macbook?
> 
> Also, have you tried the modified hdaps-gl, available here:
> http://mactel-linux.svn.sourceforge.net/viewvc/mactel-linux/trunk/tools/hdaps-gl/
> ? Is it working correctly?

Ok I tried it out again and it is in fact the x-axis that is inverted
within neverball.  The hdaps-gl works fine (Macbook Core Duo here).  

Thanks for the driver!

-- 
Bob Copeland %% www.bobcopeland.com 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.

2007-03-20 Thread Nick Piggin

Adam Litke wrote:

Signed-off-by: Adam Litke <[EMAIL PROTECTED]>
---

 include/linux/mm.h |   25 +
 1 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 60e0e4a..7089323 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -98,6 +98,7 @@ struct vm_area_struct {
 
 	/* Function pointers to deal with this struct. */

struct vm_operations_struct * vm_ops;
+   const struct pagetable_operations_struct * pagetable_ops;
 
 	/* Information about our backing store: */

unsigned long vm_pgoff; /* Offset (within vm_file) in PAGE_SIZE


Can you remind me why this isn't in vm_ops?

Also, it is going to be hugepage-only, isn't it? So should the naming be
changed to reflect that? And #ifdef it...


@@ -218,6 +219,30 @@ struct vm_operations_struct {
 };
 
 struct mmu_gather;

+
+struct pagetable_operations_struct {
+   int (*fault)(struct mm_struct *mm,
+   struct vm_area_struct *vma,
+   unsigned long address, int write_access);


I got dibs on fault ;)

My callback is a sanitised one that basically abstracts the details of the
virtual memory mapping away, so it is usable by drivers and filesystems.

You actually want to bypass the normal fault handling because it doesn't
know how to deal with your virtual memory mapping. Hmm, the best suggestion
I can come up with is handle_mm_fault... unless you can think of a better
name for me to use.


+   int (*copy_vma)(struct mm_struct *dst, struct mm_struct *src,
+   struct vm_area_struct *vma);
+   int (*pin_pages)(struct mm_struct *mm, struct vm_area_struct *vma,
+   struct page **pages, struct vm_area_struct **vmas,
+   unsigned long *position, int *length, int i);
+   void (*change_protection)(struct vm_area_struct *vma,
+   unsigned long address, unsigned long end, pgprot_t newprot);
+   unsigned long (*unmap_page_range)(struct vm_area_struct *vma,
+   unsigned long address, unsigned long end, long *zap_work);
+   void (*free_pgtable_range)(struct mmu_gather **tlb,
+   unsigned long addr, unsigned long end,
+   unsigned long floor, unsigned long ceiling);
+};
+
+#define has_pt_op(vma, op) \
+   ((vma)->pagetable_ops && (vma)->pagetable_ops->op)
+#define pt_op(vma, call) \
+   ((vma)->pagetable_ops->call)
+
 struct inode;
 
 #define page_private(page)		((page)->private)


--


--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RSDL v0.31

2007-03-20 Thread Al Boldi
Artur Skawina wrote:
> Al Boldi wrote:
> > --- sched.bak.c 2007-03-16 23:07:23.0 +0300
> > +++ sched.c 2007-03-19 23:49:40.0 +0300
> > @@ -938,7 +938,11 @@ static void activate_task(struct task_st
> >  (now - p->timestamp) >> 20);
> > }
> >
> > -   p->quota = rr_quota(p);
> > +   /*
> > +* boost factor hardcoded to 5; adjust to your liking
> > +* higher means more likely to DoS
> > +*/
> > +   p->quota = rr_quota(p) + (((now - p->timestamp) >> 20) * 5);
> > p->prio = effective_prio(p);
> > p->timestamp = now;
> > __activate_task(p, rq);
>
> i've tried this and it lasted only a few minutes -- i was seeing
> mouse cursor stalls lasting almost 1s. i/o bound tasks starving X?
> After reverting the patch everything is smooth again.

This patch wasn't really meant for production, as any sleeping background 
proc turned cpu-hog may DoS the system.

If you like to play with this, then you probably want to at least reset the 
quota in its expiration.


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: taskstats accounting info

2007-03-20 Thread Balbir Singh

Randy Dunlap wrote:

On Thu, 15 Mar 2007 11:06:55 -0800 Andrew Morton wrote:


It's the most portable example, since it does not depend on libnl.

err, what is libnl?


lib-netlink (as already answered, but I wrote this last week)



I was referring to the library at http://people.suug.ch/~tgr/libnl/


If there exists some real userspace infrastructure which utilises
taskstats, can we please get a referece to it into the kernel
Documentation?  Perhaps in the TASKSTATS Kconfig entry, thanks.



Balbir, I was working with getdelays.c when I initially wrote
these questions.  Here is a small patch for it.  Hopefully you can
use it when you find the updated version of it.

~Randy

From: Randy Dunlap <[EMAIL PROTECTED]>

1.  add usage() function

2.  add unknown character in %c format (was only in %d, not useful):

./getdelays: invalid option -- h
Unknown option '?' (63)

instead of:

./getdelays: invalid option -- h
Unknown option 63

(or just remove that message)

3.  -v does not use an optarg, so remove ':' in getopt string after 'v';



Thanks, these look good. I'll add them to my local copy.



--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/31] 2.6.20-stable review

2007-03-20 Thread Gene Heskett
On Tuesday 20 March 2007, Greg KH wrote:
[...]
>> It looks like, from the series files contents, that I grabbed the
>> wrong 'queue', its all 2.6.21 stuff.  url please.
>
>The patch queue can be found at:
>  git://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git
>
>All of the patches and the series file are in the review-2.6.20/
>directory.

This is not available via ftp?  My git is git version 0.99.7d, veddy veddy 
long in the tooth by now I suspect.  And, apparently no manpages.  Humm, 
smart is still running, maybe it has a newer git that's more conversant 
with your syntax.  Yup smart is pulling it in now.  Later, thanks.

>thanks,
>
>greg k-h



-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
When I works, I works hard.
When I sits, I sits easy.
And when I thinks, I goes to sleep.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/31] 2.6.20-stable review

2007-03-20 Thread Greg KH
On Tue, Mar 20, 2007 at 11:04:43PM -0400, Gene Heskett wrote:
> On Tuesday 20 March 2007, Michael Krufky wrote:
> >Gene Heskett wrote:
> >> On Tuesday 20 March 2007, Greg KH wrote:
> >>> On Tue, Mar 20, 2007 at 01:15:02AM -0400, Gene Heskett wrote:
>  In any event, something tickled the monster, and its hungry.  This
>  is a full-stop, show-stopper AFAIAC.
> 
>  I'll cut that patch-2.6.20.4-rc1 in halves, and build 2 more test
>  kernels tomorrow to start a bisect if no one has any better idea
>  before then.
> >>>
> >>> I'd recommend using the quilt tree of patches to do this, it will be
> >>> easier than trying to split the larger patch up into pieces by hand.
> >>>
> >>> thanks,
> >>>
> >>> greg k-h
> >>
> >> I don't have quilt installed, and no idea how to use it if it was,
> >> Greg. Even my git is now pushing a year old.
> >
> >Gene-
> >
> >quilt is a very easy tool to use...  Here is a quick run-down.
> >
> >#1) remove the stable -rc patch from the tree
> >
> >#2) install quilt
> >
> >#3) go to your source tree, and create a directory called "patches"
> >
> >#4) copy all of the patches from Greg's queue into that "patches"
> > directory, including the "series" file.
> >
> >#5) quilt push, to apply the first patch
> >quilt push, to apply the second patch (you get the idea)
> >quilt push -a , to apply all patches in the series
> >
> >#6) quilt pop, to back out the most recent patch pushed in
> >quilt pop -a to back out the entire series.
> >
> >If a patch doesnt apply, quilt will let you know about it, and present
> > you with the option to force the patch to apply.  This is highly
> > unlikely to occur, since Greg has already done what needs to be done to
> > make these patches apply to the source.
> >
> >For a better explanation, see "man quilt".  Using quilt, when you have a
> > list of patches and a series file containing the merge order is *much*
> > easier that a git bisection, and can help you to find the problem patch
> > much quicker.
> >
> >I hope this helps...
> >
> It looks like, from the series files contents, that I grabbed the 
> wrong 'queue', its all 2.6.21 stuff.  url please.

The patch queue can be found at:
  git://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git

All of the patches and the series file are in the review-2.6.20/
directory.

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BLK_DEV_MD with CONFIG_NET

2007-03-20 Thread David Miller
From: Randy Dunlap <[EMAIL PROTECTED]>
Date: Tue, 20 Mar 2007 20:05:38 -0700

> Build a kernel with CONFIG_NET-n and CONFIG_BLK_DEV_MD=m.
> Unless csum_partial() is built and kept by some arch Makefile,
> the result is:
> ERROR: "csum_partial" [drivers/md/md-mod.ko] undefined!
> make[1]: *** [__modpost] Error 1
> make: *** [modules] Error 2
> 
> 
> Any suggested solutions?

Anything which is every exported to modules, which ought to
be the situation in this case, should be obj-y not lib-y
right?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] 2.6.21-rc4-mm1: freezing of processes broken

2007-03-20 Thread Nigel Cunningham
Hi.

On Tue, 2007-03-20 at 19:23 -0600, Eric W. Biederman wrote:
> "Rafael J. Wysocki" <[EMAIL PROTECTED]> writes:
> 
> > On Tuesday, 20 March 2007 22:06, Rafael J. Wysocki wrote:
> >> On Tuesday, 20 March 2007 21:58, Jiri Slaby wrote:
> >> > Rafael J. Wysocki napsal(a):
> >> > > Actually, the problem is 100% reproducible on my system too and I doubt
> > it's
> >> > > caused by the recent freezer patches.
> >> > 
> >> > I don't know what exactly do you mean by recent, but 2.6.21-rc3-mm2 works
> >> > for me.
> >> 
> >> Thanks for the confirmation.
> >> 
> >> The patches I was talking about had already been in 2.6.21-rc3-mm2, so the
> >> reason of this failure must be different.
> >
> > Bisection shows that the freezing of processes has been broken by one of the
> > patches:
> >
> > remove-the-likelypid-check-in-copy_process.patch
> 
> Grr.  Oleg's review of remove-the-likelypid-check-in-copy-process
> showed it to be questionable (and it was just an optimization)
> so we can get rid of that one easily. 
> 
> Although all it did that was really questionable was add
> the idle process to the global process list and bump a process
> count when we forked the idle process.  Not dramatically dangerous
> things.
> 
> > use-task_pgrp-task_session-in-copy_process.patch
> 
> As I recall that patch was pretty trivial, and shouldn't have
> anything to do with the freezer.   The process freezer doesn't care
> about pids does it?

Could the freezer code be trying to freeze the idle thread as a result?

Regards,

Nigel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Yet another function to load cr3 register

2007-03-20 Thread Glauber de Oliveira Costa
There is (was) yet another function to load something into
the cr3 register. We don't need it.

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>

-- 
Glauber de Oliveira Costa
Red Hat Inc.
"Free as in Freedom"
diff --git a/arch/x86_64/kernel/smp.c b/arch/x86_64/kernel/smp.c
index af1ec4d..f338fc5 100644
--- a/arch/x86_64/kernel/smp.c
+++ b/arch/x86_64/kernel/smp.c
@@ -76,7 +76,7 @@ static inline void leave_mm(int cpu)
if (read_pda(mmu_state) == TLBSTATE_OK)
BUG();
cpu_clear(cpu, read_pda(active_mm)->cpu_vm_mask);
-   load_cr3(swapper_pg_dir);
+   write_cr3(__pa(swapper_pg_dir));
 }
 
 /*
diff --git a/include/asm-x86_64/mmu_context.h b/include/asm-x86_64/mmu_context.h
index 09add28..7781e99 100644
--- a/include/asm-x86_64/mmu_context.h
+++ b/include/asm-x86_64/mmu_context.h
@@ -22,11 +22,6 @@ static inline void enter_lazy_tlb(struct mm_struct *mm, 
struct task_struct *tsk)
 #endif
 }
 
-static inline void load_cr3(pgd_t *pgd)
-{
-   asm volatile("movq %0,%%cr3" :: "r" (__pa(pgd)) : "memory");
-}
-
 static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, 
 struct task_struct *tsk)
 {
@@ -39,7 +34,7 @@ static inline void switch_mm(struct mm_struct *prev, struct 
mm_struct *next,
write_pda(active_mm, next);
 #endif
cpu_set(cpu, next->cpu_vm_mask);
-   load_cr3(next->pgd);
+   write_cr3(__pa(next->pgd));
 
if (unlikely(next->context.ldt != prev->context.ldt)) 
load_LDT_nolock(>context);
@@ -54,7 +49,7 @@ static inline void switch_mm(struct mm_struct *prev, struct 
mm_struct *next,
 * tlb flush IPI delivery. We must reload CR3
 * to make sure to use no freed page tables.
 */
-   load_cr3(next->pgd);
+   write_cr3(__pa(next->pgd));
load_LDT_nolock(>context);
}
}


Re: [RFC] SLAB : NUMA cache_free_alien() very expensive because of virt_to_slab(objp); nodeid = slabp->nodeid;

2007-03-20 Thread Christoph Lameter
On Wed, 21 Mar 2007, Andi Kleen wrote:

> > We usually use page_to_nid(). Sure this will determine the node the object 
> > resides on. But this may not be the node on which the slab is tracked 
> > since there may have been a fallback at alloc time.
> 
> How about your slab rewrite?  I assume it would make more sense to fix
> such problems in that code instead of the old which is going to be replaced
> at some point.

The slab rewrite first allocates a page and then determines where it 
came from instead of requiring the page allocator to allocate from a 
certain node. Plus SLUB does not keep per cpu or per node object queues. 
So the problem does not occur in the same way. The per cpu slab in SLUB 
can contain objects from another node whereas SLAB can only put node local 
objects on its queues.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


BLK_DEV_MD with CONFIG_NET

2007-03-20 Thread Randy Dunlap

drivers/md/md.c calls csum_partial().

IF CONFIG_NET=n and BLK_DEV_MD=y, if arch/*/lib/Makefile
puts csum-partial.o or checksum.o into lib-y, the function
is present.  (Of course, if the function is placed in
obj-y, there is no problem.)

If CONFIG_NET=n and BLK_DEV_MD=n, if arch/*/lib/Makefile
puts csum-partial.o or checksum.o into lib-y, the function
is removed from the kernel image due to having no built-in
callers.

Build a kernel with CONFIG_NET-n and CONFIG_BLK_DEV_MD=m.
Unless csum_partial() is built and kept by some arch Makefile,
the result is:
ERROR: "csum_partial" [drivers/md/md-mod.ko] undefined!
make[1]: *** [__modpost] Error 1
make: *** [modules] Error 2


Any suggested solutions?


Recorded for posterity here:
  http://bugzilla.kernel.org/show_bug.cgi?id=8242

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/31] 2.6.20-stable review

2007-03-20 Thread Gene Heskett
On Tuesday 20 March 2007, Michael Krufky wrote:
>Gene Heskett wrote:
>> On Tuesday 20 March 2007, Greg KH wrote:
>>> On Tue, Mar 20, 2007 at 01:15:02AM -0400, Gene Heskett wrote:
 In any event, something tickled the monster, and its hungry.  This
 is a full-stop, show-stopper AFAIAC.

 I'll cut that patch-2.6.20.4-rc1 in halves, and build 2 more test
 kernels tomorrow to start a bisect if no one has any better idea
 before then.
>>>
>>> I'd recommend using the quilt tree of patches to do this, it will be
>>> easier than trying to split the larger patch up into pieces by hand.
>>>
>>> thanks,
>>>
>>> greg k-h
>>
>> I don't have quilt installed, and no idea how to use it if it was,
>> Greg. Even my git is now pushing a year old.
>
>Gene-
>
>quilt is a very easy tool to use...  Here is a quick run-down.
>
>#1) remove the stable -rc patch from the tree
>
>#2) install quilt
>
>#3) go to your source tree, and create a directory called "patches"
>
>#4) copy all of the patches from Greg's queue into that "patches"
> directory, including the "series" file.
>
>#5) quilt push, to apply the first patch
>quilt push, to apply the second patch (you get the idea)
>quilt push -a , to apply all patches in the series
>
>#6) quilt pop, to back out the most recent patch pushed in
>quilt pop -a to back out the entire series.
>
>If a patch doesnt apply, quilt will let you know about it, and present
> you with the option to force the patch to apply.  This is highly
> unlikely to occur, since Greg has already done what needs to be done to
> make these patches apply to the source.
>
>For a better explanation, see "man quilt".  Using quilt, when you have a
> list of patches and a series file containing the merge order is *much*
> easier that a git bisection, and can help you to find the problem patch
> much quicker.
>
>I hope this helps...
>
It looks like, from the series files contents, that I grabbed the 
wrong 'queue', its all 2.6.21 stuff.  url please.

Thanks.

>Good Luck,
>
>Michael Krufky



-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
"You show me an American who can keep his mouth shut and I'll eat him."
-- Newspaperman from Frank Capra's _Meet_John_Doe_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/31] 2.6.20-stable review

2007-03-20 Thread Gene Heskett
On Tuesday 20 March 2007, Michael Krufky wrote:
>Gene Heskett wrote:
>> On Tuesday 20 March 2007, Greg KH wrote:
>>> On Tue, Mar 20, 2007 at 01:15:02AM -0400, Gene Heskett wrote:
 In any event, something tickled the monster, and its hungry.  This
 is a full-stop, show-stopper AFAIAC.

 I'll cut that patch-2.6.20.4-rc1 in halves, and build 2 more test
 kernels tomorrow to start a bisect if no one has any better idea
 before then.
>>>
>>> I'd recommend using the quilt tree of patches to do this, it will be
>>> easier than trying to split the larger patch up into pieces by hand.
>>>
>>> thanks,
>>>
>>> greg k-h
>>
>> I don't have quilt installed, and no idea how to use it if it was,
>> Greg. Even my git is now pushing a year old.
>
>Gene-
>
>quilt is a very easy tool to use...  Here is a quick run-down.
>
>#1) remove the stable -rc patch from the tree
>
>#2) install quilt
>
>#3) go to your source tree, and create a directory called "patches"
>
>#4) copy all of the patches from Greg's queue into that "patches"
> directory, including the "series" file.

I hope by Greg's queue you meant:


As that's the tree I just pulled in and put in /usr/src/patches with gftp.

>#5) quilt push, to apply the first patch
>quilt push, to apply the second patch (you get the idea)
>quilt push -a , to apply all patches in the series

>#6) quilt pop, to back out the most recent patch pushed in
>quilt pop -a to back out the entire series.
>
>If a patch doesnt apply, quilt will let you know about it, and present
> you with the option to force the patch to apply.  This is highly
> unlikely to occur, since Greg has already done what needs to be done to
> make these patches apply to the source.
>
>For a better explanation, see "man quilt".  Using quilt, when you have a
> list of patches and a series file containing the merge order is *much*
> easier that a git bisection, and can help you to find the problem patch
> much quicker.
>
>I hope this helps...

I do too, but I'm now less than 2 hours from the backup run, which with 
this currently running kernel, should be a sane one.

?  Since this will start with a patch level of a 2.6.20.3 kernel (or is 
that patch level a mistaken assumption on my part, but I haven't reversed 
the 2.6.20.4-rc1 patch yet, that's next.), and I have to edit both the 
Makefile and my 'makeit' script so the names all match, what makes a good 
naming convention while I'm doing this?  This might all be clear once I 
read the quilt manpage, which I haven't had a chance to do yet.

That is next, after I reverse that patch.

>Good Luck,
>
>Michael Krufky

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
The chief cause of problems is solutions.
-- Eric Sevareid
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PAGE_SIZE Availability Inconsistency

2007-03-20 Thread H. Peter Anvin

Anton Blanchard wrote:

Hi,

The advantage would be that it wouldn't require a v3 for platforms for 
which MIN_PAGE_SIZE == PAGE_SIZE, which accounts for a very large 
percentage of systems.


You still have to look for the darn magic in two places, so there is no 
reason for it to be different.


The problem is if you can hit in two places then what PAGE_SIZE should
you use to size the contents of the swap header while remaining backward
compatible.

Im leaning towards Dave suggestion of creating a clean v3 swap header.



Changing the header format doesn't make *ANY* difference whatsoever.

You have to write two copies of the swap header, and the kernel should 
check for a header at MIN_PAGE_SIZE first and then at PAGE_SIZE.


If there are fields (other than position) in the v2 swap header that are 
dependent on PAGE_SIZE, then the copy at MIN_PAGE_SIZE should be sized 
using MIN_PAGE_SIZE, and the copy at PAGE_SIZE should be sized at 
PAGE_SIZE.  It's that simple.


Creating a new format will not help that one iota, and will create 
gratuitous incompatiblity for the very common case of PAGE_SIZE == 
MIN_PAGE_SIZE.


-hpa


-hpa

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] 2.6.21-rc4-mm1: freezing of processes broken

2007-03-20 Thread sukadev
Eric W. Biederman [EMAIL PROTECTED] wrote:
| "Rafael J. Wysocki" <[EMAIL PROTECTED]> writes:
| 
| > On Tuesday, 20 March 2007 22:06, Rafael J. Wysocki wrote:
| >> On Tuesday, 20 March 2007 21:58, Jiri Slaby wrote:
| >> > Rafael J. Wysocki napsal(a):
| >> > > Actually, the problem is 100% reproducible on my system too and I doubt
| > it's
| >> > > caused by the recent freezer patches.
| >> > 
| >> > I don't know what exactly do you mean by recent, but 2.6.21-rc3-mm2 works
| >> > for me.
| >> 
| >> Thanks for the confirmation.
| >> 
| >> The patches I was talking about had already been in 2.6.21-rc3-mm2, so the
| >> reason of this failure must be different.
| >
| > Bisection shows that the freezing of processes has been broken by one of the
| > patches:
| >
| > remove-the-likelypid-check-in-copy_process.patch
| 
| Grr.  Oleg's review of remove-the-likelypid-check-in-copy-process
| showed it to be questionable (and it was just an optimization)
| so we can get rid of that one easily. 
| 
| Although all it did that was really questionable was add
| the idle process to the global process list and bump a process
| count when we forked the idle process.  Not dramatically dangerous
| things.
| 
| > use-task_pgrp-task_session-in-copy_process.patch
| 
| As I recall that patch was pretty trivial, and shouldn't have
| anything to do with the freezer.   The process freezer doesn't care
| about pids does it?

Yes. I think this one is trivial too. Here is the effective change in
copy_process():

-   attach_pid(p, PIDTYPE_PGID, find_pid(pgid));
-   attach_pid(p, PIDTYPE_SID, find_pid(sid));
+   attach_pid(p, PIDTYPE_PGID, task_pgrp(current));
+   attach_pid(p, PIDTYPE_SID, task_session(current));


| 
| Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] SLAB : NUMA cache_free_alien() very expensive because of virt_to_slab(objp); nodeid = slabp->nodeid;

2007-03-20 Thread Andi Kleen
> We usually use page_to_nid(). Sure this will determine the node the object 
> resides on. But this may not be the node on which the slab is tracked 
> since there may have been a fallback at alloc time.

How about your slab rewrite?  I assume it would make more sense to fix
such problems in that code instead of the old which is going to be replaced
at some point.

-Andi
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Zachary Amsden

Linus Torvalds wrote:

On Tue, 20 Mar 2007, Zachary Amsden wrote:
  

Actually, I was thinking the irq handlers would just not mess around with
eflags on the stack, just call the chip to ack the interrupt and re-enable
hardware interrupts when they left, since that is free anyway with the iret.



No can do. Think level-triggered. You *need* to disable the interrupt, and 
disabling it at the CPU is the easiest approach. Even so, you need to 
worry about SMP and screaming interrupts at all CPU's, but if you don't 
ack it to the IO-APIC until later, that should be ok (alternatively, you 
need to just mask-and-ack the irq controller).
  


Well, you can keep it masked, but a more important point is that I've 
entirely neglected local interrupts.  This might work for IRQs, but for 
local timer or thermal or IPIs, using the tasklet based replay simply 
will not work.


One of the advantages of doing that is that you only ever have a queue of 
one single entry, which then makes it easier to do the replay.
  


Yes.  Unfortunately now both do_IRQ and all the smp_foo interrupt 
handlers need to detect and queue for replay, but fortunately they all 
have the interrupt number conveniently on the stack.


Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: scsi: Devices offlined

2007-03-20 Thread Wakko Warner
Philippe Troin wrote:
> Wakko Warner <[EMAIL PROTECTED]> writes:
> 
> > [84797.683873] sr 1:0:13:0: scsi: Device offlined - not ready after error 
> > recovery
> > 
> > Is there anyway to make the kernel "online" a device that has done this? 
> > I've had this happen on various devices (mostly on usb where I can
> > unplug/replug), but this time, it's on a scsi controller and the driver is
> > not a module.
> > 
> > If it's possible to do this w/o rebooting, I'd like to know for when I have
> > this happen in the future.
> 
> Have you tried:
> 
>   echo remove-single-device BUS ID LUN > /proc/scsi
>   echo add-single-device BUS ID LUN > /proc/scsi

I tried:
echo "scsi remove-scingle-device 1 0 13 0" > /proc/scsi/scsi

echo "scsi add-scingle-device 1 0 13 0" > /proc/scsi/scsi


I also tried your method, which just gives invalid argument (using
/proc/scsi/scsi, since /proc/scsi is a directory)

According to scsi_proc.c:
/*
 * Usage: echo "scsi add-single-device 0 1 2 3" >/proc/scsi/scsi
 * with  "0 1 2 3" replaced by your "Host Channel Id Lun".
 */

/*
 * Usage: echo "scsi remove-single-device 0 1 2 3" >/proc/scsi/scsi
 * with  "0 1 2 3" replaced by your "Host Channel Id Lun".
 */

Either way, nothing.

Er, on second though, I actually unplugged the power from the scsi device
and replugged it and it shows up now.  Oh well..  Thanks anyway =)

-- 
 Lab tests show that use of micro$oft causes cancer in lab animals
 Got Gas???
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Linus Torvalds


On Tue, 20 Mar 2007, Zachary Amsden wrote:
> 
> Actually, I was thinking the irq handlers would just not mess around with
> eflags on the stack, just call the chip to ack the interrupt and re-enable
> hardware interrupts when they left, since that is free anyway with the iret.

No can do. Think level-triggered. You *need* to disable the interrupt, and 
disabling it at the CPU is the easiest approach. Even so, you need to 
worry about SMP and screaming interrupts at all CPU's, but if you don't 
ack it to the IO-APIC until later, that should be ok (alternatively, you 
need to just mask-and-ack the irq controller).

> Maybe leaving irqs disabled is better.

One of the advantages of doing that is that you only ever have a queue of 
one single entry, which then makes it easier to do the replay.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PAGE_SIZE Availability Inconsistency

2007-03-20 Thread Anton Blanchard

Hi,

> The advantage would be that it wouldn't require a v3 for platforms for 
> which MIN_PAGE_SIZE == PAGE_SIZE, which accounts for a very large 
> percentage of systems.
> 
> You still have to look for the darn magic in two places, so there is no 
> reason for it to be different.

The problem is if you can hit in two places then what PAGE_SIZE should
you use to size the contents of the swap header while remaining backward
compatible.

Im leaning towards Dave suggestion of creating a clean v3 swap header.

Anton
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH]: pcmcia - spot slave decode flaws (for testing)

2007-03-20 Thread Komuro

Hi,

"[PATCH]: pcmcia - spot slave decode flaws (for testing)" works properly.
(kernel 2.6.21-rc4-mm1)

pccard: PCMCIA card inserted into slot 1
pcmcia: registering new device pcmcia1.0
ata3: PATA max PIO0 cmd 0x00010100 ctl 0x0001010e bmdma 0x irq 3
scsi2 : pata_pcmcia
ata3.00: CFA: SunDisk SDP5-10, Rev 3.70, max PIO0
ata3.00: 20480 sectors, multi 0: LBA 
ata3.01: CFA: SunDisk SDP5-10, Rev 3.70, max PIO0
ata3.01: 20480 sectors, multi 0: LBA 
ata3.01: is a ghost device, ignoring.
ata3.01: disabled
ata3.00: configured for PIO0
scsi 2:0:0:0: Direct-Access ATA  SunDisk SDP5-10  Rev  PQ: 0 ANSI: 5
sd 2:0:0:0: [sda] 20480 512-byte hardware sectors (10 MB)
sd 2:0:0:0: [sda] Write Protect is off
sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 2:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support 
DPO or FUA
sd 2:0:0:0: [sda] 20480 512-byte hardware sectors (10 MB)
sd 2:0:0:0: [sda] Write Protect is off
sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 2:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support 
DPO or FUA
 sda: sda1
sd 2:0:0:0: [sda] Attached SCSI removable disk
sd 2:0:0:0: Attached scsi generic sg0 type 0

Best Regards
Komuro


>If you've got a CF adapter or PCMCIA disc which shows up twice in libata
>pata_pcmcia can you try this patch on top of the updates posted. It tries
>to spot when the slave is a mirror of the master and to fix up problems
>that causes.
>
>Signed-off-by: Alan Cox <[EMAIL PROTECTED]>
>
>diff -u --new-file --recursive --exclude-from /usr/src/exclude \
>linux.vanilla-2.6.20-mm2/drivers/ata/pata_pcmcia.c \
>linux-2.6.20-mm2/drivers/ata/pata_pcmcia.c
>--- linux.vanilla-2.6.20-mm2/drivers/ata/pata_pcmcia.c 2007-02-20 
>13:37:58.0 \
>+
>+++ linux-2.6.20-mm2/drivers/ata/pata_pcmcia.c 2007-02-20 14:28:13.0 
>+
>@@ -54,6 +54,39 @@
>   dev_node_t  node;
> };
> 
>+/**
>+ *pcmcia_set_mode -   PCMCIA specific mode setup
>+ *@ap: Port
>+ *@r_failed_dev: Return pointer for failed device
>+ *
>+ *Perform the tuning and setup of the devices and timings, which
>+ *for PCMCIA is the same as any other controller. We wrap it however
>+ *as we need to spot hardware with incorrect or missing master/slave
>+ *decode, which alas is embarrassingly common in the PC world
>+ */
>+ 
>+static int pcmcia_set_mode(struct ata_port *ap, struct ata_device 
>**r_failed_dev)
>+{
>+  struct ata_device *master = >device[0];
>+  struct ata_device *slave = >device[1];
>+  
>+  if (!ata_dev_enabled(master) || !ata_dev_enabled(slave))
>+  return ata_do_set_mode(ap, r_failed_dev);
>+  
>+  if (memcmp(master->id + ATA_ID_FW_REV,  slave->id + ATA_ID_FW_REV,
>+ ATA_ID_FW_REV_LEN + ATA_ID_PROD_LEN) == 0)
>+  {
>+  /* Suspicious match, but could be two cards from
>+ the same vendor - check serial */
>+  if (memcmp(master->id + ATA_ID_SERNO, slave->id + ATA_ID_SERNO,
>+ ATA_ID_SERNO_LEN) == 0 && master->id[ATA_ID_SERNO] 
>>> 8) {
>+  ata_dev_printk(slave, KERN_WARNING, "is a ghost device, 
>ignoring.\n");
>+  ata_dev_disable(slave);
>+  }
>+  }
>+  return ata_do_set_mode(ap, r_failed_dev);
>+}
>+
> static struct scsi_host_template pcmcia_sht = {
>   .module = THIS_MODULE,
>   .name   = DRV_NAME,
>@@ -73,6 +106,7 @@
> };
> 
> static struct ata_port_operations pcmcia_port_ops = {
>+  .set_mode   = pcmcia_set_mode,
>   .port_disable   = ata_port_disable,
>   .tf_load= ata_tf_load,
>   .tf_read= ata_tf_read,
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: scsi: Devices offlined

2007-03-20 Thread Philippe Troin
Wakko Warner <[EMAIL PROTECTED]> writes:

> [84797.683873] sr 1:0:13:0: scsi: Device offlined - not ready after error 
> recovery
> 
> Is there anyway to make the kernel "online" a device that has done this? 
> I've had this happen on various devices (mostly on usb where I can
> unplug/replug), but this time, it's on a scsi controller and the driver is
> not a module.
> 
> If it's possible to do this w/o rebooting, I'd like to know for when I have
> this happen in the future.

Have you tried:

  echo remove-single-device BUS ID LUN > /proc/scsi
  echo add-single-device BUS ID LUN > /proc/scsi

?

Phil.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Zachary Amsden

Linus Torvalds wrote:

On Tue, 20 Mar 2007, Zachary Amsden wrote:
  

void local_irq_restore(int enabled)
{
   pda.intr_mask = enabled;
   /*
* note there is a window here where softirqs are not processed by
* the interrupt handler, but that is not a problem, since it will
* get done here in the outer enable of any nested pair.
*/
   if (enabled)
   local_bh_enable();
}



Actually, this one is more complicated. You also need to actually enable 
hardware interrupts again if they got disabled by an interrupt actually 
occurring while the "soft-interrupt" was disabled.
  


Actually, I was thinking the irq handlers would just not mess around 
with eflags on the stack, just call the chip to ack the interrupt and 
re-enable hardware interrupts when they left, since that is free anyway 
with the iret.  Maybe leaving irqs disabled is better.


Anyway, it really *should* be pretty damn simple. No need to disable 
preemption, there should be no events that can *cause* it, since all 
interrupts get headed off at the pass.. (the return-from-interrupt thng 
should already notice that it's returning to an interrupts-disabled 
section and not try to do any preemption).
  
What did I miss?
  


I wasn't disabling preemption to actually disable preemption.  I was 
just using bh_disable as a global hammer to stop softirqs (thus the irq 
replay tasklet) from running during the normal irq_exit path.  Then, we 
can just use the existing software IRQ replay code, and I think barely 
any new code (queue_irq(), etc) has to be written.


Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] utrace: make an inline void

2007-03-20 Thread Roland McGrath
> Avoid multiple/repeated warnings:
> include/linux/utrace.h:594: warning: return type defaults to 'int'

Oops!  Thanks for catching this.


Thanks,
Roland
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc4-mm1

2007-03-20 Thread Randy Dunlap
On Mon, 19 Mar 2007 20:56:23 -0800 Andrew Morton wrote:

> 
> Temporarily at
> 
>   http://userweb.kernel.org/~akpm/2.6.21-rc4-mm1/
> 
> Will appear later at
> 
>   
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc4/2.6.21-rc4-mm1/


UIO_CIF should depend on PCI ??

With CONFIG_PCI=n, I get:

ERROR: "pci_module_init" [drivers/uio/uio_cif.ko] undefined!
ERROR: "pci_release_regions" [drivers/uio/uio_cif.ko] undefined!
make[1]: *** [__modpost] Error 1
make: *** [modules] Error 2

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] 2.6.21-rc4-mm1: freezing of processes broken

2007-03-20 Thread Eric W. Biederman
"Rafael J. Wysocki" <[EMAIL PROTECTED]> writes:

> On Tuesday, 20 March 2007 22:06, Rafael J. Wysocki wrote:
>> On Tuesday, 20 March 2007 21:58, Jiri Slaby wrote:
>> > Rafael J. Wysocki napsal(a):
>> > > Actually, the problem is 100% reproducible on my system too and I doubt
> it's
>> > > caused by the recent freezer patches.
>> > 
>> > I don't know what exactly do you mean by recent, but 2.6.21-rc3-mm2 works
>> > for me.
>> 
>> Thanks for the confirmation.
>> 
>> The patches I was talking about had already been in 2.6.21-rc3-mm2, so the
>> reason of this failure must be different.
>
> Bisection shows that the freezing of processes has been broken by one of the
> patches:
>
> remove-the-likelypid-check-in-copy_process.patch

Grr.  Oleg's review of remove-the-likelypid-check-in-copy-process
showed it to be questionable (and it was just an optimization)
so we can get rid of that one easily. 

Although all it did that was really questionable was add
the idle process to the global process list and bump a process
count when we forked the idle process.  Not dramatically dangerous
things.

> use-task_pgrp-task_session-in-copy_process.patch

As I recall that patch was pretty trivial, and shouldn't have
anything to do with the freezer.   The process freezer doesn't care
about pids does it?

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] split file and anonymous page queues #3

2007-03-20 Thread Matt Mackall
On Tue, Mar 20, 2007 at 06:08:10PM -0400, Rik van Riel wrote:
> - "Active:   %8lu kB\n"
> - "Inactive: %8lu kB\n"
...
> + "Active(anon):   %8lu kB\n"
> + "Inactive(anon): %8lu kB\n"
> + "Active(file):   %8lu kB\n"
> + "Inactive(file): %8lu kB\n"

Potentially incompatible change. How about preserving the original
fields (by totalling), then adding the other fields in a second patch.

>   if (!pagevec_add(_pvec, page))
> - __pagevec_lru_add(_pvec);
> + __pagevec_lru_add_file(_pvec);

Wouldn't lru_file_add or file_lru_add be a better name? If the object
is a "file lru" then sticking "add" in the middle is a little ugly.

>   spin_lock_irq(>lru_lock);
>   if (PageLRU(page) && !PageActive(page)) {
> - del_page_from_inactive_list(zone, page);
> + if (page_anon(page)) {
> + del_page_from_inactive_anon_list(zone,page);
>   SetPageActive(page);
> - add_page_to_active_list(zone, page);
> + add_page_to_active_anon_list(zone, page);
> + } else {
> + del_page_from_inactive_file_list(zone, page);
> + SetPageActive(page);
> + add_page_to_active_file_list(zone, page);
> + }
>   __count_vm_event(PGACTIVATE);
>   }

Missing a level of indentation.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] [RFC] hugetlb: pagetable_operations API (V2)

2007-03-20 Thread William Lee Irwin III
On Mon, Mar 19, 2007 at 01:05:02PM -0700, Adam Litke wrote:
> Andrew, given the favorable review of these patches the last time
> around, would you consider them for the -mm tree?  Does anyone else
> have any objections?

We need a new round of commentary for how it should integrate with
Nick Piggin's fault handling patches given that both introduce very
similar ->fault() methods, albeit at different places and for different
purposes.

I think things weren't entirely wrapped up last time but there was
general approval in concept and code-level issues had been gotten past.
I've forgotten the conclusion of hch and arjan's commentary on making
the pagetable operations mandatory. ISTR they were all cosmetic affairs
like that or whether they should be part of ->vm_ops as opposed to
fundamental issues.

The last thing I'd want to do is hold things back, so by no means
delay merging etc. on account of this, but I am curious on several
points. First, is there any demonstrable overhead to mandatory indirect
calls for the pagetable operations? Second, can case analysis for e.g.
file-backed vs. anon and/or COW vs. shared be avoided by the use of
the indirect function call, or more specifically, to any beneficial
effect? Well, I rearranged the code in such a manner ca. 2.6.6 so I
know the rearrangement is possible, but not the performance impact vs.
modern kernels, if any, never mind how the code ends up looking in
modern kernels. Third, could you use lmbench or some such to get direct
fork() and fault handling microbenchmarks? Kernel compiles are too
close to macrobenchmarks to say anything concrete there apart from that
other issues (e.g. SMP load balancing, NUMA, lock contention, etc.)
dominate indirect calls. If you have the time or interest to explore
any of these areas, I'd be very interested in hearing the results.

One thing I would like to see for sure is dropping the has_pt_op()
and pt_op() macros. The Linux-native convention is to open-code the
function pointer fetches, and the non-native convention is to wrap
things like defaulting (though they actually do something more involved)
in the analogue of pt_op() for the purposes of things like extensible
sets of operations bordering on OOP-ish method tables. So this ends up
as some sort of hybrid convention without the functionality of the
non-native call wrappers and without the clarity of open-coding. My
personal preference is that the function pointer table be mandatory and
the call to the the function pointer be unconditional and the type
dispatch accomplished entirely through the function pointers, but I'm
not particularly insistent about that.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


scsi: Devices offlined

2007-03-20 Thread Wakko Warner
[84797.683873] sr 1:0:13:0: scsi: Device offlined - not ready after error 
recovery

Is there anyway to make the kernel "online" a device that has done this? 
I've had this happen on various devices (mostly on usb where I can
unplug/replug), but this time, it's on a scsi controller and the driver is
not a module.

If it's possible to do this w/o rebooting, I'd like to know for when I have
this happen in the future.

If it matters, the driver is aic79xx, kernel 2.6.20.

-- 
 Lab tests show that use of micro$oft causes cancer in lab animals
 Got Gas???
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Linus Torvalds


On Tue, 20 Mar 2007, Zachary Amsden wrote:
> 
> void local_irq_restore(int enabled)
> {
>pda.intr_mask = enabled;
>/*
> * note there is a window here where softirqs are not processed by
> * the interrupt handler, but that is not a problem, since it will
> * get done here in the outer enable of any nested pair.
> */
>if (enabled)
>local_bh_enable();
> }

Actually, this one is more complicated. You also need to actually enable 
hardware interrupts again if they got disabled by an interrupt actually 
occurring while the "soft-interrupt" was disabled.

But since it's all a local-cpu issue, you can do things like test 
cpu-local memory flags for whetehr that has happened or not.

So it *should* be something as simple as

local_irq_disable()
{
pda.irq_enable = 0;
}

handle_interrupt()
{
if (!pda.irq_enable) {
pda.irq_queued = 1;
queue_interrupt();
.. make sure we return with hardirq's now 
   disabled: just clear IF in the pt_regs ..
return;
}
.. normal ..
}

local_irq_enable()
{
pda.irq_enable = 1;
barrier();
/* Common case - nothing happened while we were fake-disabled.. 
*/
if (!pda.irq_queued)
return; 

/* Ok, actually handle the things! */
handle_queued_irqs();

/*
 * And enable the hw interrupts again, they got disabled 
 * when we were queueing stuff.. 
 */
hardware_sti();
}

but I haven't really gone over it in any detail, I may have missed 
something really obvious.

Anyway, it really *should* be pretty damn simple. No need to disable 
preemption, there should be no events that can *cause* it, since all 
interrupts get headed off at the pass.. (the return-from-interrupt thng 
should already notice that it's returning to an interrupts-disabled 
section and not try to do any preemption).

What did I miss?

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ANNOUNCE] Guilt v0.23

2007-03-20 Thread Josef Sipek
Guilt v0.23 is available for download (once it mirrors out on kernel.org).

Guilt (Git Quilt) is a series of bash scripts which add a Mercurial
queues-like functionality and interface to git.

Tarballs:
http://www.kernel.org/pub/linux/kernel/people/jsipek/guilt/

Git repo:
git://git.kernel.org/pub/scm/linux/kernel/git/jsipek/guilt.git


There are two brand new commands: fork and fold. A few minor bugs related to
really unusual patch names (containing commas) were fixed. And finally, each
command has a manual page!

As always, patches, and other feedback is welcome.

Josef "Jeff" Sipek.


Changes since v0.22:

Brandon Philips (2):
  Get usage information from the USAGE variable
  Remaining guilt documentation

Josef 'Jeff' Sipek (14):
  Docs: Ignore generated usage-*.txt files
  Docs: Consistified the man pages
  Docs: Reimplemented cmd-list generation script in sh
  Docs: Added import-commit manpage
  Docs: Fixed up patchbomb documentation
  Docs: Include version Guilt version number in page footer
  Remove pop_patch as there are no real users
  fold: fold a patch into the topmost patch
  delete: use series_remove_patch instead of opencoding the logic
  fold: fixed patch backup creation
  Docs: guilt-fold manpage
  new: escape new patch name properly when changing the series file
  fork: fork the topmost applied patch
  Guilt v0.23
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Jeremy Fitzhardinge
Matt Mackall wrote:
> On Tue, Mar 20, 2007 at 09:31:58AM -0700, Jeremy Fitzhardinge wrote:
>   
>> Linus Torvalds wrote:
>> 
>>> On Tue, 20 Mar 2007, Eric W. Biederman wrote:
>>>   
>>>   
 If that is the case.  In the normal kernel what would
 the "the oops, we got an interrupt code do?"
 I assume it would leave interrupts disabled when it returns?
 Like we currently do with the delayed disable of normal interrupts?
 
 
>>> Yeah, disable interrupts, and set a flag that the fake "sti" can test, and 
>>> just return without doing anything.
>>>
>>> (You may or may not also need to do extra work to Ack the hardware 
>>> interrupt etc, which may be irq-controller specific. Once the CPU has 
>>> accepted the interrupt, you may not be able to just leave it dangling)
>>>   
>>>   
>> So it would be something like:
>>
>> pda.intr_mask = 1;   /* disable interrupts */
>> ...
>> pda.intr_mask = 0;   /* enable interrupts */
>> if (xchg(_pending, 0))  /* check pending */
>>  asm("sti"); /* was pending; isr left cpu interrupts masked 
>> */
>> 
>
> I don't know that you need an xchg there. If you're still on the same
> CPU, it should all be nice and causal even across an interrupt handler.
> So it could be:
>
>pda.intr_mask = 0; /* intr_pending can't get set after this */
>if (unlikely(pda.intr_pending)) {
>   pda.intr_pending = 0;
>   asm("sti");
>}
>
> (This would actually need a C barrier, but I'll ignore that as this'd
> end up being asm...)
>
> But other interesting things could happen. If we never did a real CLI
> and we get preempted and switched to another CPU between clearing
> intr_mask and checking intr_pending, we get a little confused. 
>   

Could prevent preempt if pda.intr_mask is set.  preemptible() is defined as:

# define preemptible()(preempt_count() == 0 && !irqs_disabled())

anyway, so that would be changed to look at the intr_mask rather than
eflags.
(I'm not sure if preemptible() is actually used to determine whether
preempt or not).

Alternatively, the intr_mask could be encoded in a bit of preempt_count...

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] SLAB : NUMA cache_free_alien() very expensive because of virt_to_slab(objp); nodeid = slabp->nodeid;

2007-03-20 Thread Christoph Lameter
On Tue, 20 Mar 2007, Eric Dumazet wrote:

> I understand we want to do special things (fallback and such tricks) at
> allocation time, but I believe that we can just trust the real nid of memory
> at free time.

Sorry no. The node at allocation time determines which node specific 
structure tracks the slab. If we fall back then the node is allocated 
from one node but entered in the node structure of another. Thus you 
cannot free the slab without knowing the node at allocation time.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.21-rc4] ieee1394: fix oops on "modprobe -r ohci1394" after network class_device conversion

2007-03-20 Thread Stefan Richter
On 20 Mar, Greg KH wrote:
> On Tue, Mar 20, 2007 at 10:43:22PM +0100, Stefan Richter wrote:
>> @@ -586,7 +586,10 @@ static void ether1394_add_host (struct h
>>  }
>>  
>>  SET_MODULE_OWNER(dev);
>> +#if 0
>> +/* FIXME - Is this the correct parent device anyway? */
>>  SET_NETDEV_DEV(dev, >device);
>> +#endif
> 
> That's interesting.  What does 'tree /sys/class/net/' look like with
> this patch applied?  Does the eth1394 device now live off in
> /sys/device/virtual?

Yes.

lrwxrwxrwx  1 root root 0 Mär 21 01:02 eth0 -> 
../../devices/pci:00/:00:0b.0/eth0/
lrwxrwxrwx  1 root root 0 Mär 21 01:02 eth1 -> ../../devices/virtual/net/eth1/
lrwxrwxrwx  1 root root 0 Mär 21 01:02 lo -> ../../devices/virtual/net/lo/

(eth1 is IP over 1394 alias eth1394. eth0 is an actual ethernet
interface.)

And eth1/device (ex -> ../../../devices/pci*___*/fw-host*) is now gone.
Would anybody miss it?

> If so, I guess this is ok for now as we can wait for the rewrite of the
> ieee1394 subsystem to get the linking done correctly :)

That's my hope too.
-- 
Stefan Richter
-=-=-=== --== =-=-=
http://arcgraph.de/sr/


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] SLAB : NUMA cache_free_alien() very expensive because of virt_to_slab(objp); nodeid = slabp->nodeid;

2007-03-20 Thread Christoph Lameter
On Tue, 20 Mar 2007, Andi Kleen wrote:

> > > Is it possible virt_to_slab(objp)->nodeid being different from 
> > > pfn_to_nid(objp) ?
> > 
> > It is possible the page allocator falls back to another node than 
> > requested. We would need to check that this never occurs.
> 
> The only way to ensure that would be to set a strict mempolicy.
> But I'm not sure that's a good idea -- after all you don't want
> to fail an allocation in this case.
> 
> But pfn_to_nid on the object like proposed by Eric should work anyways.
> But I'm not sure the tables used for that will be more often cache hot
> than the slab.

We usually use page_to_nid(). Sure this will determine the node the object 
resides on. But this may not be the node on which the slab is tracked 
since there may have been a fallback at alloc time.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fbdev sysfs imrovements

2007-03-20 Thread Greg KH
On Tue, Mar 20, 2007 at 02:25:49PM +, James Simmons wrote:
> 
> This patch does several things to allow the underlying hardware to be 
> shared amount many devices. The most important thing is the use of
> the created device via device_create instead of the hardware device. No 
> longer should fbdev drivers use the xxx_set_drvdata with the parent
> bus device. The second change is having a bus independent power management
> for the framebuffer driver. The final change is using the release method 
> to cleanup the device. The reason again is to make the fbdev driver 
> independent of the bus parent device. Feedback is welcomed.

Looks good to me.

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc4-mm1

2007-03-20 Thread J.A. Magallón
On Tue, 20 Mar 2007 17:36:57 +0100, "J.A. Magallón" <[EMAIL PROTECTED]> wrote:

> On Mon, 19 Mar 2007 20:56:23 -0800, Andrew Morton <[EMAIL PROTECTED]> wrote:
> 
> > 
> > Temporarily at
> > 
> >   http://userweb.kernel.org/~akpm/2.6.21-rc4-mm1/
> > 
> > Will appear later at
> > 
> >   
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc4/2.6.21-rc4-mm1/
> > 
> 
> (oops, I forgot LKML)
> 
> I have no udev events for my dvd-rw...
> When I insert a disc in the dvd reader:
> 
> werewolf:~# udevmonitor
> udevmonitor prints the received event from the kernel [UEVENT]
> and the event which udev sends out after rule processing [UDEV]
> 
> UEVENT[1174385162.607021] mount/block/sr1 (block)
> UDEV  [1174385162.610056] mount/block/sr1 (block)
> 
> If I insert it in the dvd-rw drive, nothing happens.
> 

I realized that my scsi devices were like this:

werewolf:~# lsscsi
[0:0:0:0]cd/dvd  HL-DT-ST DVDRAM GSA-H10N  JL10  /dev/.tmp-11-0
[0:0:1:0]diskIOMEGA   ZIP 250  51.G  /dev/sda
[1:0:0:0]diskATA  ST3120022A   3.06  /dev/sdb
[1:0:1:0]cd/dvd  TOSHIBA  DVD-ROM SD-M1712 1004  /dev/.tmp-11-1
[2:0:0:0]diskATA  ST3200822AS  3.01  /dev/sdc
[7:0:0:0]diskLG   USBDrive 1100  /dev/sdd

After a service udev force-reload:

werewolf:~# lsscsi
[0:0:0:0]cd/dvd  HL-DT-ST DVDRAM GSA-H10N  JL10  /dev/sr0
[0:0:1:0]diskIOMEGA   ZIP 250  51.G  /dev/sda
[1:0:0:0]diskATA  ST3120022A   3.06  /dev/sdb
[1:0:1:0]cd/dvd  TOSHIBA  DVD-ROM SD-M1712 1004  /dev/sr1
[2:0:0:0]diskATA  ST3200822AS  3.01  /dev/sdc
[7:0:0:0]diskLG   USBDrive 1100  /dev/sdd

If I insert a disc in /dev/sr1 and eject it:

werewolf:~# lsscsi
[0:0:0:0]cd/dvd  HL-DT-ST DVDRAM GSA-H10N  JL10  /dev/sr0
[0:0:1:0]diskIOMEGA   ZIP 250  51.G  /dev/sda
[1:0:0:0]diskATA  ST3120022A   3.06  /dev/sdb
[1:0:1:0]cd/dvd  TOSHIBA  DVD-ROM SD-M1712 1004  /dev/.tmp-11-1
[2:0:0:0]diskATA  ST3200822AS  3.01  /dev/sdc
[7:0:0:0]diskLG   USBDrive 1100  /dev/sdd

If I reload the disc in the TOSHIBA, it is automounted but the strange
device is still there.

Trying with /dev/sr0 still gives no events. What is happening here ?
It is the kernel or is udev setup ?

TIA

--
J.A. Magallon  \   Software is like sex:
 \ It's better when it's free
Mandriva Linux release 2007.1 (Cooker) for i586
Linux 2.6.20-jam05 (gcc 4.1.2 20070302 (prerelease) (4.1.2-1mdv2007.1)) #2 SMP 
PREEMPT
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Zachary Amsden

Jeremy Fitzhardinge wrote:

Zachary Amsden wrote:
  

I think Jeremy's idea was to have interrupt handlers leave interrupts
disabled on exit if pda.intr_mask was set.  In which case, they would
bypass all work and we could never get preempted.



Yes, I was worried that if we left the isr without actually handling the
interrupt, it would still be asserted and we'd just get interrupted
again.  The idea is that we avoid touching cli/sti for the common case
of no interrupts while interrupts are disabled, but we'd still need to
fall back to using them if an interrupt becomes pending.

  

I don't think leaving hardware interrupts disabled for such a long
time is good though. 



How long?  It would be no longer than now, and possibly less, wouldn't it?
  


Hmm.  Perhaps.  Something about the asymmetry bothers me alot though.

Zach

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Paul Mackerras
Linus Torvalds writes:

> We should just do this natively. There's been several tests over the years 
> saying that it's much more efficient to do sti/cli as a simple store, and 
> handling the "oops, we got an interrupt while interrupts were disabled" as 
> a special case.
> 
> I have this dim memory that ARM has done it that way for a long time 
> because it's so expensive to do a "real" cli/sti.
> 
> And I think -rt does it for other reasons. It's just more flexible.

64-bit powerpc does this now as well.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Matt Mackall
On Tue, Mar 20, 2007 at 03:08:19PM -0800, Zachary Amsden wrote:
> Matt Mackall wrote:
> >I don't know that you need an xchg there. If you're still on the same
> >CPU, it should all be nice and causal even across an interrupt handler.
> >So it could be:
> >
> >   pda.intr_mask = 0; /* intr_pending can't get set after this */
> >  
> 
> Why not?  Oh, I see.  intr_mask is inverted form of EFLAGS_IF.

It's not even that. There are two things that can happen:

case 1:

  intr_mask = 1;

  intr_mask = 0;
  /* intr_pending is already set and CLI is in effect */
  if(intr_pending)

case 2:

  intr_mask = 1;
  intr_mask = 0;

  /* intr_pending remains cleared */
  if(intr_pending)

As this is all about local interrupts, it's all on a single CPU and
out of order issues aren't visible..
 
> >(This would actually need a C barrier, but I'll ignore that as this'd
> >end up being asm...)

..unless the compiler is doing the reordering, of course.

> >But other interesting things could happen. If we never did a real CLI
> >and we get preempted and switched to another CPU between clearing
> >intr_mask and checking intr_pending, we get a little confused. 
> 
> I think Jeremy's idea was to have interrupt handlers leave interrupts 
> disabled on exit if pda.intr_mask was set.  In which case, they would 
> bypass all work and we could never get preempted.

I was actually worrying about the case where the interrupt came in
"late". But I don't think it's a problem there either.

> I don't think leaving 
> hardware interrupts disabled for such a long time is good though.

It can only be worse than the current situation by the amount of time
it takes to defer an interrupt once. On average, it'll be a lot
better as most critical sections are -tiny-.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: UDP packets scheduling

2007-03-20 Thread Robert Hancock

Lukas Hejtmanek wrote:

On Tue, Mar 20, 2007 at 06:52:51PM +0100, Andi Kleen wrote:

Flow control must be turned off for some other reason.
That's your fundamental problem. Fix that. 

Even if you get the rate right there can be many reasons why timing 
gets disrupted temporarily and to recover from any of this you need

working flow control.


How do you want to enable wire/fibre flow control, e.g., from Europe to USA?
You are not guaranteed any hardware based flow control.

And as of software flow control, you need precise timing of UDP packets to
keep desired packet rate. How to do it at speeds about 5.5Gbps, that is my
question.


Why is it necessary to avoid bursting? There should be enough buffering 
along the chain to avoid packet loss with reasonable burst sizes.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sata_nv exceptions in 2.6.20.3

2007-03-20 Thread Robert Hancock

Christopher Mulcahy wrote:


My harddrives are Seagate Barracudas ( 7200.10, 500GB, ST3500630AS )

They have jumpers to toggle between 1.5Gb/s operation and 3.0Gb/s 
operation.


When jumpered to 1.5Gbs operation, there are no problems.

Settings the drives to 3.0Gbs operation generates exceptions on only one 
of the 4 drives, but this quickly leads to hard lock-up.  ( I don't see 
a kernel crash-dump, but the system becomes frozen and has to be reset )


The errors, lspci output, and dmesg from both 3G ( broken ) and 1.5G 
(working) operation are as follows:


Tried a different SATA cable on that drive? The controller is reporting 
SError values and CMD errors, which likely tends to indicate some kind 
of SATA communication problem..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Linus Torvalds


On Tue, 20 Mar 2007, Andi Kleen wrote:
> 
> Linus is worried about the unwinder crashing -- that wouldn't help with that.

And to make it clear: this is not a theoretical worry. It happened many 
times over the months the unwinder was in. 

It was supposed to help debugging, but it made bugs that *would* have been 
nicely debuggable without it into nightmares. So the only reason for it 
existing in the first place was actually the thing that made it not work.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc4-mm1

2007-03-20 Thread Randy Dunlap
On Mon, 19 Mar 2007 20:56:23 -0800 Andrew Morton wrote:

> 
> Temporarily at
> 
>   http://userweb.kernel.org/~akpm/2.6.21-rc4-mm1/
> 
> Will appear later at
> 
>   
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc4/2.6.21-rc4-mm1/


I think that this:

config EEPROM_93CX6
tristate "EEPROM 93CX6 support"
---help---
This is a driver for the EEPROM chipsets 93c46 and 93c66.
The driver supports both read as well as write commands.

should not be in lib/Kconfig.  lib/ is not for drivers.
or (simpler) s/driver/library/
but I think I'd rather see it in drivers/misc/.


and the help text needs to be indented 2 more spaces...

---
~Randy
boilerplate:
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] [RFC] hugetlb: pagetable_operations API (V2)

2007-03-20 Thread Dave Hansen
On Mon, 2007-03-19 at 13:05 -0700, Adam Litke wrote:
> For the common case (vma->pagetable_ops == NULL), we do almost the
> same thing as the current code: load and test.  The third instruction
> is different in that we jump for the common case instead of jumping in
> the hugetlb case.  I don't think this is a big deal though.  If it is,
> would an unlikely() macro fix it? 

I wouldn't worry about micro-optimizing it at that level.  The CPU does
enough stuff under the covers that I wouldn't worry about it at all.

I wonder if the real differential impact (if any) is likely to come from
the pagetable_ops cacheline being hot or cold, since it is in a
different place in the structure than the flags.  But, from a quick
glance I see a few vm_ops references preceding pagetable_ops references,
so the pagetable_ops cacheline might already be hot most of the time.  

BTW, are there any other possible users for these things other than
large pages?

-- Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Sparse participating in Google Summer of Code 2007; apply by March 24th

2007-03-20 Thread Josh Triplett
[Sending this to LKML as well, to reach more of Sparse's user community.]

Google has accepted Sparse as a mentoring organization for Summer of Code
2007.  Interested students can propose work on Sparse-related projects, work
on those projects over the summer, and receive a stipend from Google for their
work.

Student application deadline: March 24th

Sparse, the semantic parser, provides a compiler frontend capable of parsing
most of ANSI C as well as many GCC extensions, and a collection of sample
compiler backends, including a static analyzer also called 'sparse'. Sparse
provides a set of annotations designed to convey semantic information about
types, such as what address space pointers point to, or what locks a function
acquires or releases.  The Linux kernel community uses Sparse to check for
common errors in kernel source code.  Other projects, such as X.org, have
begun to use Sparse as well.

Working on a Sparse project gives students the opportunity to put many core CS
skills into practice on a real-world compiler and static analyzer, and gain
some recognition within the prominent community of Free and Open Source
Software developers working on the Linux kernel.

You can see the current Summer of Code project list for Sparse at
, or propose an idea
of your own.

Students need to apply by March 24th.  You can apply at
.

Any Sparse developers interested in mentoring projects over the summer (which
primarily consists of answering questions about Sparse, such as on the mailing
list), please apply via the Google Summer of Code mentorship application at
http://code.google.com/soc/mentor.html , and check the "Sparse" box.  Please
also mail me with details.

Any Linux developers interested in seeing Sparse do something that it can't
currently do, please propose possible Summer of Code projects as soon as you
can, and I'll add them to
http://www.kernel.org/pub/software/devel/sparse/soc.html .

- Josh Triplett



signature.asc
Description: OpenPGP digital signature


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Linus Torvalds


On Tue, 20 Mar 2007, Andi Kleen wrote:

> On Tue, Mar 20, 2007 at 11:49:39AM -0700, Linus Torvalds wrote:
> > 
> > the thing is, I'd rather see a long backtrace that is hard to decipher but 
> > that *never* *ever* causes any additional problems, over a pretty one.
> 
> Well it causes additional problems. We had some cases where it was really
> hard to distingush garbage and the true call chain. I can probably dig
> out some examples if you want.

Well, by "additional problems" _I_ mean things like "a warning turned into 
a fatal oops and didn't get logged at all".

That's a lot more serious than "there were a few extra entries in the 
traceback that caused us some confusion".

And yes, we had exactly that case happen several times.

> With lots of call backs (e.g. common with sysfs) it is also frequently
> not obvious how the call chains are supposed to go.

With callbacks, it's actually often nice to see the callback data that is 
on the stack (and it's very obvious from the "" ksymtab 
explanation: you can't have a  that is anything but a callback 
pointer (since it isn't a return address).

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mv643xx_eth: add mv643xx_eth_shutdown function

2007-03-20 Thread Dale Farnsworth
From: Dale Farnsworth <[EMAIL PROTECTED]>

mv643xx_eth_shutdown is needed for kexec.

Signed-off-by: Dale Farnsworth <[EMAIL PROTECTED]>

---
 drivers/net/mv643xx_eth.c |   14 ++
 1 file changed, 14 insertions(+)

Index: linux-2.6-powerpc-df/drivers/net/mv643xx_eth.c
===
--- linux-2.6-powerpc-df.orig/drivers/net/mv643xx_eth.c
+++ linux-2.6-powerpc-df/drivers/net/mv643xx_eth.c
@@ -1516,9 +1516,23 @@ static int mv643xx_eth_shared_remove(str
return 0;
 }
 
+static void mv643xx_eth_shutdown(struct platform_device *pdev)
+{
+   struct net_device *dev = platform_get_drvdata(pdev);
+   struct mv643xx_private *mp = netdev_priv(dev);
+   unsigned int port_num = mp->port_num;
+
+   /* Mask all interrupts on ethernet port */
+   mv_write(MV643XX_ETH_INTERRUPT_MASK_REG(port_num), 0);
+   mv_read (MV643XX_ETH_INTERRUPT_MASK_REG(port_num));
+
+   eth_port_reset(port_num);
+}
+
 static struct platform_driver mv643xx_eth_driver = {
.probe = mv643xx_eth_probe,
.remove = mv643xx_eth_remove,
+   .shutdown = mv643xx_eth_shutdown,
.driver = {
.name = MV643XX_ETH_NAME,
},
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.21-rc4] ieee1394: fix oops on "modprobe -r ohci1394" after network class_device conversion

2007-03-20 Thread Greg KH
On Tue, Mar 20, 2007 at 10:43:22PM +0100, Stefan Richter wrote:
> The networking subsystem has been converted from class_device to device
> but ieee1394 hasn't.  This results in a 100% reproducible NULL pointer
> dereference if the ohci1394 driver module is unloaded while the eth1394
> module is still loaded.
> http://lkml.org/lkml/2006/11/16/147
> http://lkml.org/lkml/2007/3/14/4
> 
> This is a regression in 2.6.21-rc1.
> 
> Signed-off-by: Stefan Richter <[EMAIL PROTECTED]>
> ---
> 
> Works for me.  I still can connect to an OS X box via eth1394 after that
> and modprobe -r ohci1394 before modprobe -r eth1394 works again.
> 
> Index: linux-2.6.21-rc4/drivers/ieee1394/eth1394.c
> ===
> --- linux-2.6.21-rc4.orig/drivers/ieee1394/eth1394.c  2007-03-16 
> 19:24:44.0 +0100
> +++ linux-2.6.21-rc4/drivers/ieee1394/eth1394.c   2007-03-20 
> 22:28:49.0 +0100
> @@ -586,7 +586,10 @@ static void ether1394_add_host (struct h
>  }
>  
>   SET_MODULE_OWNER(dev);
> +#if 0
> + /* FIXME - Is this the correct parent device anyway? */
>   SET_NETDEV_DEV(dev, >device);
> +#endif

That's interesting.  What does 'tree /sys/class/net/' look like with
this patch applied?  Does the eth1394 device now live off in
/sys/device/virtual?

If so, I guess this is ok for now as we can wait for the rewrite of the
ieee1394 subsystem to get the linking done correctly :)

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 2.6.21-rc4-git] SCSI newstyle hotplug/coldplug support

2007-03-20 Thread David Brownell
This teaches scsi devices how to support "new style" hotplug/coldplug:
using a modalias sysfs attribute for coldplug, and MODALIAS environment
variable for hotplug.

It also updates the CH, SD, SR, and ST drivers with the aliases needed to
drive them by that mechanism.  (Older OnStream devices use OSST not ST;
left for someone else to sort out.  SG seems best loaded by KMOD.)

Using this, I've seen pure new-style hotplugging drive the loading of all
the relevant driver modules for usb storage devices:  host controller,
usb-storage, scsi core, sd_mod.  Previously, sd_mod never loaded.  (Except
when using the obsolete old-style hotplug scripts ... which are unusable
on ~100 BogoMIPS embedded systems that only run busybox, but may have no
options for lots of external storage other than USB.)  Yep, this is a LOT
faster too.

Signed-off-by: David Brownell <[EMAIL PROTECTED]>
---
 drivers/scsi/ch.c |1 +
 drivers/scsi/scsi_sysfs.c |   24 
 drivers/scsi/sd.c |4 
 drivers/scsi/sr.c |2 ++
 drivers/scsi/st.c |1 +
 5 files changed, 32 insertions(+)

--- g26.orig/drivers/scsi/scsi_sysfs.c  2007-03-16 12:54:22.0 -0700
+++ g26/drivers/scsi/scsi_sysfs.c   2007-03-20 17:27:07.0 -0700
@@ -276,6 +276,18 @@ static int scsi_bus_match(struct device 
return (sdp->inq_periph_qual == SCSI_INQ_PQ_CON)? 1: 0;
 }
 
+/* for hotplug support: modprobe $MODALIAS */
+static int scsi_uevent(struct device *dev, char **envp, int num_envp,
+   char *buffer, int buffer_size)
+{
+   struct scsi_device *sdp = to_scsi_device(dev);
+
+   envp[0] = buffer;
+   snprintf(buffer, buffer_size, "MODALIAS=scsi:type-%02x", (u8)sdp->type);
+   envp[1] = NULL;
+   return 0;
+}
+
 static int scsi_bus_suspend(struct device * dev, pm_message_t state)
 {
struct scsi_device *sdev = to_scsi_device(dev);
@@ -308,6 +320,7 @@ static int scsi_bus_resume(struct device
 struct bus_type scsi_bus_type = {
 .name  = "scsi",
 .match = scsi_bus_match,
+   .uevent = scsi_uevent,
.suspend= scsi_bus_suspend,
.resume = scsi_bus_resume,
 };
@@ -547,6 +560,16 @@ show_sdev_iostat(iorequest_cnt);
 show_sdev_iostat(iodone_cnt);
 show_sdev_iostat(ioerr_cnt);
 
+/* for coldplug support: modprobe $(cat .../modalias) */
+static ssize_t
+show_modalias(struct device *dev, struct device_attribute *attr, char *buf)
+{
+   struct scsi_device *sdev = to_scsi_device(dev);
+
+   return snprintf(buf, 20, "scsi:type-%02x\n", (u8)sdev->type);
+}
+static DEVICE_ATTR(modalias, S_IRUGO, show_modalias, NULL);
+
 
 /* Default template for device attributes.  May NOT be modified */
 static struct device_attribute *scsi_sysfs_sdev_attrs[] = {
@@ -566,6 +589,7 @@ static struct device_attribute *scsi_sys
_attr_iorequest_cnt,
_attr_iodone_cnt,
_attr_ioerr_cnt,
+   _attr_modalias,
NULL
 };
 
--- g26.orig/drivers/scsi/sd.c  2007-02-19 13:43:00.0 -0800
+++ g26/drivers/scsi/sd.c   2007-03-20 17:27:07.0 -0700
@@ -89,6 +89,10 @@ MODULE_ALIAS_BLOCKDEV_MAJOR(SCSI_DISK13_
 MODULE_ALIAS_BLOCKDEV_MAJOR(SCSI_DISK14_MAJOR);
 MODULE_ALIAS_BLOCKDEV_MAJOR(SCSI_DISK15_MAJOR);
 
+MODULE_ALIAS("scsi:type-00");  /* TYPE_DISK */
+MODULE_ALIAS("scsi:type-07");  /* TYPE_MOD */
+MODULE_ALIAS("scsi:type-0e");  /* TYPE_RBC */
+
 /*
  * This is limited by the naming scheme enforced in sd_probe,
  * add another character to it if you really need more disks.
--- g26.orig/drivers/scsi/sr.c  2007-02-15 18:17:21.0 -0800
+++ g26/drivers/scsi/sr.c   2007-03-20 17:27:07.0 -0700
@@ -62,6 +62,8 @@
 MODULE_DESCRIPTION("SCSI cdrom (sr) driver");
 MODULE_LICENSE("GPL");
 MODULE_ALIAS_BLOCKDEV_MAJOR(SCSI_CDROM_MAJOR);
+MODULE_ALIAS("scsi:type-04");  /* TYPE_WORM */
+MODULE_ALIAS("scsi:type-05");  /* TYPE_ROM */
 
 #define SR_DISKS   256
 
--- g26.orig/drivers/scsi/st.c  2007-02-19 22:14:16.0 -0800
+++ g26/drivers/scsi/st.c   2007-03-20 17:27:07.0 -0700
@@ -89,6 +89,7 @@ MODULE_AUTHOR("Kai Makisara");
 MODULE_DESCRIPTION("SCSI tape (st) driver");
 MODULE_LICENSE("GPL");
 MODULE_ALIAS_CHARDEV_MAJOR(SCSI_TAPE_MAJOR);
+MODULE_ALIAS("scsi:type-01");  /* TYPE_TAPE */
 
 /* Set 'perm' (4th argument) to 0 to disable module_param's definition
  * of sysfs parameters (which module_param doesn't yet support).
--- g26.orig/drivers/scsi/ch.c  2007-02-15 18:17:21.0 -0800
+++ g26/drivers/scsi/ch.c   2007-03-20 17:27:07.0 -0700
@@ -38,6 +38,7 @@ MODULE_DESCRIPTION("device driver for sc
 MODULE_AUTHOR("Gerd Knorr <[EMAIL PROTECTED]>");
 MODULE_LICENSE("GPL");
 MODULE_ALIAS_CHARDEV_MAJOR(SCSI_CHANGER_MAJOR);
+MODULE_ALIAS("scsi:type-08");  /* TYPE_MEDIUM_CHANGER */
 
 static int init = 1;
 module_param(init, int, 0444);
-
To unsubscribe from this list: send the line 

Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Jeremy Fitzhardinge
Zachary Amsden wrote:
> I think Jeremy's idea was to have interrupt handlers leave interrupts
> disabled on exit if pda.intr_mask was set.  In which case, they would
> bypass all work and we could never get preempted.

Yes, I was worried that if we left the isr without actually handling the
interrupt, it would still be asserted and we'd just get interrupted
again.  The idea is that we avoid touching cli/sti for the common case
of no interrupts while interrupts are disabled, but we'd still need to
fall back to using them if an interrupt becomes pending.

> I don't think leaving hardware interrupts disabled for such a long
> time is good though. 

How long?  It would be no longer than now, and possibly less, wouldn't it?

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: dst_ifdown breaks infiniband?

2007-03-20 Thread David Miller
From: "Michael S. Tsirkin" <[EMAIL PROTECTED]>
Date: Tue, 20 Mar 2007 18:02:17 +0200

> David, Alexey, what do you think about this patch? Is it right?
> Could this patch be considered for 2.6.21?
> 
> Acked-by: Michael S. Tsirkin <[EMAIL PROTECTED]>

I plan to apply it and merge.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/7] unmap_page_range for hugetlb

2007-03-20 Thread Dave Hansen
On Mon, 2007-03-19 at 13:05 -0700, Adam Litke wrote:
> Signed-off-by: Adam Litke <[EMAIL PROTECTED]>
> ---
> 
>  fs/hugetlbfs/inode.c|3 ++-
>  include/linux/hugetlb.h |4 ++--
>  mm/hugetlb.c|   12 
>  mm/memory.c |   10 --
>  4 files changed, 16 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index d0b4b46..198efa7 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -289,7 +289,7 @@ hugetlb_vmtruncate_list(struct prio_tree_root *root, 
> pgoff_t pgoff)
>   v_offset = 0;
> 
>   __unmap_hugepage_range(vma,
> - vma->vm_start + v_offset, vma->vm_end);
> + vma->vm_start + v_offset, vma->vm_end, 0);
>   }
>  }

Did you mean NULL instead of 0 here?

> @@ -568,6 +568,7 @@ const struct file_operations hugetlbfs_file_operations = {
>  static const struct pagetable_operations_struct hugetlbfs_pagetable_ops = {
>   .copy_vma   = copy_hugetlb_page_range,
>   .pin_pages  = follow_hugetlb_page,
> + .unmap_page_range   = unmap_hugepage_range,
>  };
> 
>  static const struct inode_operations hugetlbfs_dir_inode_operations = {
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 3f3e7a6..502c2f8 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -17,8 +17,8 @@ static inline int is_vm_hugetlb_page(struct vm_area_struct 
> *vma)
>  int hugetlb_sysctl_handler(struct ctl_table *, int, struct file *, void 
> __user *, size_t *, loff_t *);
>  int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct 
> vm_area_struct *);
>  int follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *, struct 
> page **, struct vm_area_struct **, unsigned long *, int *, int);
> -void unmap_hugepage_range(struct vm_area_struct *, unsigned long, unsigned 
> long);
> -void __unmap_hugepage_range(struct vm_area_struct *, unsigned long, unsigned 
> long);
> +unsigned long unmap_hugepage_range(struct vm_area_struct *, unsigned long, 
> unsigned long, long *);
> +void __unmap_hugepage_range(struct vm_area_struct *, unsigned long, unsigned 
> long, long *);
>  int hugetlb_prefault(struct address_space *, struct vm_area_struct *);
>  int hugetlb_report_meminfo(char *);
>  int hugetlb_report_node_meminfo(int, char *);
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 36db012..d902fb9 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -356,7 +356,7 @@ nomem:
>  }
> 
>  void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
> - unsigned long end)
> + unsigned long end, long *zap_work)
>  {
>   struct mm_struct *mm = vma->vm_mm;
>   unsigned long address;
> @@ -399,10 +399,13 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, 
> unsigned long start,
>   list_del(>lru);
>   put_page(page);
>   }
> +
> + if (zap_work)
> + *zap_work -= (end - start) / (HPAGE_SIZE / PAGE_SIZE);
>  }
> 
> -void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
> -   unsigned long end)
> +unsigned long unmap_hugepage_range(struct vm_area_struct *vma,
> + unsigned long start, unsigned long end, long *zap_work)
>  {
>   /*
>* It is undesirable to test vma->vm_file as it should be non-null
> @@ -414,9 +417,10 @@ void unmap_hugepage_range(struct vm_area_struct *vma, 
> unsigned long start,
>*/
>   if (vma->vm_file) {
>   spin_lock(>vm_file->f_mapping->i_mmap_lock);
> - __unmap_hugepage_range(vma, start, end);
> + __unmap_hugepage_range(vma, start, end, zap_work);
>   spin_unlock(>vm_file->f_mapping->i_mmap_lock);
>   }
> + return end;
>  }
> 
>  static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
> diff --git a/mm/memory.c b/mm/memory.c
> index 01256cf..a3bcaf3 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -839,12 +839,10 @@ unsigned long unmap_vmas(struct mmu_gather **tlbp,
>   tlb_start_valid = 1;
>   }
> 
> - if (unlikely(is_vm_hugetlb_page(vma))) {
> - unmap_hugepage_range(vma, start, end);
> - zap_work -= (end - start) /
> - (HPAGE_SIZE / PAGE_SIZE);
> - start = end;
> - } else
> + if (unlikely(has_pt_op(vma, unmap_page_range)))
> + start = pt_op(vma, unmap_page_range)
> + (vma, start, end, _work);
> + else
>   start = unmap_page_range(*tlbp, vma,
>   start, end, _work, 

Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.

2007-03-20 Thread Dave Hansen
On Mon, 2007-03-19 at 13:05 -0700, Adam Litke wrote:
> 
> +#define has_pt_op(vma, op) \
> +   ((vma)->pagetable_ops && (vma)->pagetable_ops->op)
> +#define pt_op(vma, call) \
> +   ((vma)->pagetable_ops->call) 

Can you get rid of these macros?  I think they make it a wee bit harder
to read.  My brain doesn't properly parse the foo(arg)(bar) syntax.  

+   if (has_pt_op(vma, copy_vma))
+   return pt_op(vma, copy_vma)(dst_mm, src_mm, vma);

+   if (vma->pagetable_ops && vma->pagetable_ops->copy_vma)
+   return vma->pagetable_ops->copy_vma(dst_mm, src_mm, vma);

I guess it does lead to some longish lines.  Does it start looking
really nasty?

If you're going to have them, it might just be best to put a single
unlikely() around the macro definitions themselves to keep anybody from
having to open-code it for any of the users.  

-- Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sata_nv ADMA controller lockup investigation

2007-03-20 Thread Robert Hancock

Neil Schemenauer wrote:

Not sure if this helps.  I'm getting this reset with 2.6.21-rc4.
After the reset the controller seems to work again.


...

ata2.00: ATA-7: Maxtor 6V300F0, VA111630, max UDMA/133
ata2.00: 586114704 sectors, multi 16: LBA48 NCQ (depth 31/32)


...


ata2: EH in ADMA mode, notifier 0x0 notifier_error 0x0 gen_ctl 0x1501000 status 
0x400 next cpb count 0x0 next cpb idx 0x0
ata2: CPB 1: ctl_flags 0x1f, resp_flags 0x2
ata2: timeout waiting for ADMA IDLE, stat=0x400
ata2: timeout waiting for ADMA LEGACY, stat=0x400
ata2.00: exception Emask 0x0 SAct 0x2 SErr 0x0 action 0x2 frozen
ata2.00: cmd 61/00:08:72:44:22/02:00:21:00:00/40 tag 1 cdb 0x0 data 262144 out
 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata2: soft resetting port
ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata2.00: configured for UDMA/133
ata2: EH complete
SCSI device sda: 586114704 512-byte hdwr sectors (300091 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO 
or FUA


That one looks like a drive-side issue. CPB resp_flags 2 indicates the 
drive accepted the command and the controller is still waiting for a 
response.


Could be that this is another drive that needs to be added to the NCQ 
blacklist, some similar Maxtor models seem to have issues..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


FireWire update in -mm (was 2.6.21-rc4-mm1)

2007-03-20 Thread Stefan Richter
Andrew Morton wrote:
> Will appear later at
> 
>   
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc4/2.6.21-rc4-mm1/
...
>  git-ieee1394.patch
...

Just a note for readers of lkml:  git-ieee1394.patch is steadily growing
thanks to Kristian Høgsberg's work on his new alternative FireWire
drivers.  Recently Kristian posted preliminary patches to the popular
low-level FireWire libraries libraw1394 and libdc1394, making them
interoperable with his newly designed kernel--userspace ABI.  (Mainline
Linux' IEEE 1394 subsystem features a slightly unfortunate variety of
userspace ABIs, some of them abstracted by the mentioned libraries, some
directly used.)  I heard Kristian also already worked on integration
with HAL, i.e. there are now more and more pieces of the puzzle coming
together.
-- 
Stefan Richter
-=-=-=== --== =-=--
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Zachary Amsden

Matt Mackall wrote:

I don't know that you need an xchg there. If you're still on the same
CPU, it should all be nice and causal even across an interrupt handler.
So it could be:

   pda.intr_mask = 0; /* intr_pending can't get set after this */
  


Why not?  Oh, I see.  intr_mask is inverted form of EFLAGS_IF.


   if (unlikely(pda.intr_pending)) {
  pda.intr_pending = 0;
  asm("sti");
   }

(This would actually need a C barrier, but I'll ignore that as this'd
end up being asm...)

But other interesting things could happen. If we never did a real CLI
and we get preempted and switched to another CPU between clearing
intr_mask and checking intr_pending, we get a little confused. 
  


I think Jeremy's idea was to have interrupt handlers leave interrupts 
disabled on exit if pda.intr_mask was set.  In which case, they would 
bypass all work and we could never get preempted.  I don't think leaving 
hardware interrupts disabled for such a long time is good though.



But perhaps that doesn't matter because we'd by definition have no
pending interrupts on either processor?

Is it expensive to do an STI if interrupts are already enabled?
  


Yes.

Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.21-rc1,2,3 regressions on my system that I found so far

2007-03-20 Thread Eric St-Laurent
On Tue, 2007-20-03 at 10:15 +0100, Arjan van de Ven wrote:

> disabling that is a BAD idea. I'm no fan of SMM myself, but it's there,
> and we have to live with it. Disabling it without knowing what it does
> on your system is madness.
> 

Like Lee said, for "debugging", mainly trying to resolve unexplained
long latencies.

I've had a laptop that caused latency spikes with the cpu fan was turn
on. I tried disabling SMI to diagnose the problem with no success.

My current system has a BIOS feature to control fans speed according to
temperature. I presume this must a SMI to work right?  In this case it
should be possible to find and disable the related SMI and replace the
fan control with a user space software.

Of course it's not wise to blindly disable SMIs as we don't precisely
know what they do. 


- Eric


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Matt Mackall
On Tue, Mar 20, 2007 at 09:31:58AM -0700, Jeremy Fitzhardinge wrote:
> Linus Torvalds wrote:
> > On Tue, 20 Mar 2007, Eric W. Biederman wrote:
> >   
> >> If that is the case.  In the normal kernel what would
> >> the "the oops, we got an interrupt code do?"
> >> I assume it would leave interrupts disabled when it returns?
> >> Like we currently do with the delayed disable of normal interrupts?
> >> 
> >
> > Yeah, disable interrupts, and set a flag that the fake "sti" can test, and 
> > just return without doing anything.
> >
> > (You may or may not also need to do extra work to Ack the hardware 
> > interrupt etc, which may be irq-controller specific. Once the CPU has 
> > accepted the interrupt, you may not be able to just leave it dangling)
> >   
> 
> So it would be something like:
> 
> pda.intr_mask = 1;/* disable interrupts */
> ...
> pda.intr_mask = 0;/* enable interrupts */
> if (xchg(_pending, 0))   /* check pending */
>   asm("sti"); /* was pending; isr left cpu interrupts masked 
> */

I don't know that you need an xchg there. If you're still on the same
CPU, it should all be nice and causal even across an interrupt handler.
So it could be:

   pda.intr_mask = 0; /* intr_pending can't get set after this */
   if (unlikely(pda.intr_pending)) {
  pda.intr_pending = 0;
  asm("sti");
   }

(This would actually need a C barrier, but I'll ignore that as this'd
end up being asm...)

But other interesting things could happen. If we never did a real CLI
and we get preempted and switched to another CPU between clearing
intr_mask and checking intr_pending, we get a little confused. 

But perhaps that doesn't matter because we'd by definition have no
pending interrupts on either processor?

Is it expensive to do an STI if interrupts are already enabled?

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/5] swsusp: fix error paths in snapshot_open

2007-03-20 Thread Rafael J. Wysocki
On Tuesday, 20 March 2007 23:24, Pavel Machek wrote:
> Hi!
> 
> > > > We forget to increase device_available if there's an error in
> > > > snapshot_open(), so the snapshot device cannot be open at all after
> > > > snapshot_open() has returned an error.
> > > 
> > > Actually, this should go to the beggining of series, as it is
> > > (non-critical) bugfix.
> > 
> > Well, yes.
> > 
> > I've just kept the original order.  OTOH, I don't think it's as urgent as 
> > to go
> > into 2.6.21 ("been there forever" kind of thing).
> 
> No, it is not urgent enough for 2.6.21... But I have secret plan...
> trying to push bitmaps+non-bugfixes for swsusp to 2.6.23, and have
> swsusp/s2ram stabilize during 2.6.22. Way too much stuff happened in
> 2.6.21 series.

OK by me, but I think we'll have to tell Andrew which patches should wait
for 2.6.23 anyway. ;-)

Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

2007-03-20 Thread Rusty Russell
On Tue, 2007-03-20 at 09:58 -0600, Eric W. Biederman wrote:
> Looking at the above code snippet.  I guess it is about time to
> merge our per_cpu and pda variables...

Indeed, thanks for the prod.  Now 2.6.21-rc4-mm1 is out, I'll resend the
patches.

Cheers,
Rusty.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] [RFC] AF_RXRPC socket family implementation [try #3]

2007-03-20 Thread David Howells
Alan Cox <[EMAIL PROTECTED]> wrote:

> - recvmsg not supporting MSG_TRUNC is rather weird and really ought to be
> fixed one day as its useful to find out the sizeof message pending when
> combined with MSG_PEEK

Hmmm...  I hadn't considered that.  I assumed MSG_TRUNC not to be useful as
arbitrarily chopping bits out of the request or reply would seem to be
pointless.

> - RXRPC_MIN_SECURITY_LEVEL reads into rx->min_sec_level and then if it is
> invalid reports an error but doesn't restore the valid level

Fixed.

> - Why does rxrpc_writable always return 0 ?

Good point.  That's slightly tricky to deal with as output messages don't
remain queued on the socket struct itself.  Hmmm...

One thing I'd like to be able to do is pass the sk_buffs I've set up to UDP
directly rather than having to call the UDP socket's sendmsg.  That'd eliminate
a copy.  But I decided to get it working right first, then look at cute
optimisations like that.

Such a thing would also be useful for the AFS filesystem: it could pass skbuffs
it has preloaded to AF_RXRPC, which would then hand them on to UDP.

> - rxrpc_process_soft_ACKs doesn't itself limit and check acns->nAcks is
> always below RXRPC_MAXACKS, as this is a stakc variable it ought to be
> paranoid about it. I think its ok from the caller check but its very hard
> to prove...

nAcks is a uint8_t.  If that can exceed RXRPC_MAXACKS (255) then I suspect I'll
have more pressing worries.  I could put a check in there, but the compiler
would give me a warning:-/

> It needs a lot more eyes/review due to the complexity and network
> exposure though - not your fault, whoever designed RXRPC's 8)

It's not an entirely insane protocol:-) Actually, part of the problem is Linux
itself.

David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/5] swsusp: fix error paths in snapshot_open

2007-03-20 Thread Pavel Machek
Hi!

> > > We forget to increase device_available if there's an error in
> > > snapshot_open(), so the snapshot device cannot be open at all after
> > > snapshot_open() has returned an error.
> > 
> > Actually, this should go to the beggining of series, as it is
> > (non-critical) bugfix.
> 
> Well, yes.
> 
> I've just kept the original order.  OTOH, I don't think it's as urgent as to 
> go
> into 2.6.21 ("been there forever" kind of thing).

No, it is not urgent enough for 2.6.21... But I have secret plan...
trying to push bitmaps+non-bugfixes for swsusp to 2.6.23, and have
swsusp/s2ram stabilize during 2.6.22. Way too much stuff happened in
2.6.21 series.
Pavel 

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.21-rc4] ieee1394: fix oops on "modprobe -r ohci1394" after network class_device conversion

2007-03-20 Thread Ismail Dönmez
On Tuesday 20 March 2007 23:43:22 Stefan Richter wrote:
> The networking subsystem has been converted from class_device to device
> but ieee1394 hasn't.  This results in a 100% reproducible NULL pointer
> dereference if the ohci1394 driver module is unloaded while the eth1394
> module is still loaded.
> http://lkml.org/lkml/2006/11/16/147
> http://lkml.org/lkml/2007/3/14/4
>
> This is a regression in 2.6.21-rc1.
>
> Signed-off-by: Stefan Richter <[EMAIL PROTECTED]>
> ---
>
> Works for me.  I still can connect to an OS X box via eth1394 after that
> and modprobe -r ohci1394 before modprobe -r eth1394 works again.
>
> Index: linux-2.6.21-rc4/drivers/ieee1394/eth1394.c
> ===
> --- linux-2.6.21-rc4.orig/drivers/ieee1394/eth1394.c  2007-03-16
> 19:24:44.0 +0100 +++
> linux-2.6.21-rc4/drivers/ieee1394/eth1394.c   2007-03-20 22:28:49.0
> +0100 @@ -586,7 +586,10 @@ static void ether1394_add_host (struct h
>  }
>
>   SET_MODULE_OWNER(dev);
> +#if 0
> + /* FIXME - Is this the correct parent device anyway? */
>   SET_NETDEV_DEV(dev, >device);
> +#endif
>
>   priv = netdev_priv(dev);

This also fixes the issue for me, thanks for tracking this down Stefan.

Regards.

-- 
Happiness in intelligent people is the rarest thing I know. (Ernest Hemingway)

Ismail Donmez ismail (at) pardus.org.tr
GPG Fingerprint: 7ACD 5836 7827 5598 D721 DF0D 1A9D 257A 5B88 F54C
Pardus Linux / KDE developer
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/5] swsusp: fix error paths in snapshot_open

2007-03-20 Thread Rafael J. Wysocki
On Tuesday, 20 March 2007 23:16, Pavel Machek wrote:
> Hi!
> 
> > We forget to increase device_available if there's an error in
> > snapshot_open(), so the snapshot device cannot be open at all after
> > snapshot_open() has returned an error.
> 
> Actually, this should go to the beggining of series, as it is
> (non-critical) bugfix.

Well, yes.

I've just kept the original order.  OTOH, I don't think it's as urgent as to go
into 2.6.21 ("been there forever" kind of thing).

Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Suspend to RAM doesn't work anymore in 2.6.21

2007-03-20 Thread Rafael J. Wysocki
On Tuesday, 20 March 2007 01:54, Tobias Doerffel wrote:
> On Monday 19 March 2007 22:43:20 you wrote:
> > Hi,
> >
> > On Monday, 19 March 2007 13:50, Tobias Doerffel wrote:
> > > Hi,
> > >
> > > Suspend to RAM used to work fine on my computer (Intel Core Duo, 1 GB
> > > RAM, Intel 82801G (ICH7-chipset) mainboard, NVIDIA-gfx-card,
> > > tg3-ethernet) up to 2.6.20.3. But no matter which rc of 2.6.21 I use,
> > > suspend to RAM doesn't work anymore. Up to rc3 even suspending stopped at
> > > "suspending console" which appearently seems to be fixed in rc4. I tried
> > > rc4-git4 with minimal config (no dyndicks, no HRT, no MSI, no sound, no
> > > bluetooth, no PCMCIA, no WLAN, no USB, no cpufreq) but still I can't
> > > resume properly. Caps works and I can login through SSH. Back to a more
> > > complete config (sound, MMC, WLAN, PCMCIA - still no dynticks or HRT -
> > > see attachment "config") I get exactly the same behaviour.
> > >
> > > When logged in through SSH after resume I saved output of dmesg (which
> > > includes full power management debug messages), see
> > > attachement "dmesg-resume". The system basically seems to be back but lot
> > > of things do not work such as loading/unloading e.g. my WLAN-driver
> > > (ipw3945), running "top" or "dstat" etc.   "uptime" always returns 0 min,
> > > even with power management debug disabled.
> > >
> > > Kernel:
> > > Linux version 2.6.21-rc4 (gcc version 4.1.2 20061115 (prerelease) (Debian
> > > 4.1.1-21)) #23 SMP PREEMPT Mon Mar 19 12:27:56 CET 2007
> I made some further investigations on this issue. A complete bisect between 
> 2.6.20 and 2.6.21-rc4-git4 stops at a stage 
> (a4bbb810dedaecf74d54b16b6dd3c33e95e1024c) where I'm not able to compile the 
> kernel anymore because of compiling-errors in arch/i386/kernel/setup.c 
> (ACPI-related compiling errors). Stepping some revisions back until it 
> compiled again resume didn't work either.
> 
> So I started all over again with bisect only on arch/i386 and ended up at 
> ceb6c46839021d5c7c338d48deac616944660124 as the bad commit. But this file 
> seems to be some kind of finalization of a series of patches ("ACPICA: Remove 
> duplicate table manager") so I guess it's hard to debug this thing...
> 
> > Can you please do
> >
> > # echo test > /sys/power/disk
> > # echo disk > /sys/power/state
> >
> > (the system should freeze tasks, suspend devices, disable nonboot CPUs,
> > wait for 5 seconds, enable nonboot CPUs, resume devices, thaw tasks and
> > return to your command prompt) and see if you can reproduce the problem?
> Same problem here. Works fine in 2.6.20 as well as before 
> ceb6c46839021d5c7c338d48deac616944660124. Doesn't work on recent 
> 2.6.21-rc4-git4.
> 
> Any more information I can give?

Well, this looks like an ACPI problem, and actually a regression.

I think you can open a bugzilla entry in the ACPI category and put the
above information in there (please add [EMAIL PROTECTED] to the entry's Cc
list).

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/5] swsusp: fix error paths in snapshot_open

2007-03-20 Thread Pavel Machek
Hi!

> We forget to increase device_available if there's an error in
> snapshot_open(), so the snapshot device cannot be open at all after
> snapshot_open() has returned an error.

Actually, this should go to the beggining of series, as it is
(non-critical) bugfix.

Pavel
> @@ -49,12 +49,14 @@ static int snapshot_open(struct inode *i
>   if (!atomic_add_unless(_available, -1, 0))
>   return -EBUSY;
>  
> - if ((filp->f_flags & O_ACCMODE) == O_RDWR)
> + if ((filp->f_flags & O_ACCMODE) == O_RDWR) {
> + atomic_inc(_available);
>   return -ENOSYS;
> -
> - if(create_basic_memory_bitmaps())
> + }
> + if(create_basic_memory_bitmaps()) {
> + atomic_inc(_available);
>   return -ENOMEM;
> -
> + }
>   nonseekable_open(inode, filp);
>   data = _state;
>   filp->private_data = data;

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/5] swsusp: Use GFP_KERNEL for creating basic data structures

2007-03-20 Thread Pavel Machek
Hi!

> From: Rafael J. Wysocki <[EMAIL PROTECTED]>
> 
> Make swsusp call create_basic_memory_bitmaps() before processes are frozen,
> so that GFP_KERNEL allocations can be made in it.  Additionally, ensure
> that the swsusp's userland interface won't be used while either
> pm_suspend_disk() or software_resume() is being executed.
> 
> Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>

ACK, but I'd prefer it to go before the bitmap patches, as it can be
viewed as bugfix... ok, it probably does not matter much in this case.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] hotplug cpu: move tasks in empty cpusets to parent

2007-03-20 Thread Randy Dunlap
On Tue, 20 Mar 2007 13:34:01 -0600 Cliff Wickman wrote:

> 
> From: Cliff Wickman <[EMAIL PROTECTED]>
> 
> This patch corrects a situation that occurs when one disables all the cpus
> in a cpuset.
> 
> At that point, any tasks in that cpuset are incorrectly moved (as I recall,
> they were move to a sibling cpuset).
> Such tasks should be move the parent of their current cpuset. Or if the
> parent cpuset has no cpus, to its parent, etc.
> 
> And the empty cpuset should be removed (if it is flagged notify_on_release).
> 
> This patch contains the added complexity of taking care not to do memory
> allocation while holding the cpusets callback_mutex. And it makes use of the
> "cpuset_release_agent" to do the cpuset removals.
> 
> It might be simpler to use a separate thread or workqueue. But such code
> has not yet been written.
> 
> Diffed against 2.6.20-rc6
> 
> Signed-off-by: Cliff Wickman <[EMAIL PROTECTED]>
> 
> ---
>  kernel/cpuset.c |  200 
> ++--
>  1 file changed, 180 insertions(+), 20 deletions(-)
> 
> Index: morton.070205/kernel/cpuset.c
> ===
> --- morton.070205.orig/kernel/cpuset.c
> +++ morton.070205/kernel/cpuset.c

> @@ -2070,20 +2097,100 @@ out:
>   *
>   * Call with both manage_mutex and callback_mutex held.
>   *
> + * Takes tasklist_lock, and task_lock() for cpuset members that are
> + * moved to another cpuset.
> + *
>   * Recursive, on depth of cpuset subtree.
>   */
>  
> -static void guarantee_online_cpus_mems_in_subtree(const struct cpuset *cur)
> +static void remove_tasks_in_empty_cpusets_in_subtree(const struct cpuset 
> *cur, struct list_head *empty_list, struct path_list_element **ple_array, int 
> *ple_availp, int ple_count)

That line is way too long.  Source lines should fit in 80 columns unless
they contain (maybe) a printk string that would be ugly if split (e.g.).
This one should be like so (or some other readable variant):

static void remove_tasks_in_empty_cpusets_in_subtree(
const struct cpuset *cur,
struct list_head *empty_list,
struct path_list_element **ple_array,
int *ple_availp, int ple_count)

> +{
> + int npids, ple_used=0;
> + struct cpuset *c, *parent;
> + struct path_list_element *ple;
> +
> + /* If a cpuset's mems or cpus are empty, move its tasks to its parent */
> + list_for_each_entry(c, >children, sibling) {
> + remove_tasks_in_empty_cpusets_in_subtree(c, empty_list,
> + ple_array, ple_availp, ple_count);
> + /*
> +  * If it has no online cpus or no online mems, move its tasks
> +  * to its next-highest non-empty parent and remove it.
> +  * Remove it even if it has children, as its children are a
> +  * subset of cpus and nodes, so they are empty too.
> +  * The removal is conditional on whether it is
> +  * notify-on-release.
> +  */
> + if (cpus_empty(c->cpus_allowed) ||
> +nodes_empty(c->mems_allowed)) {
> + char *path = NULL;
> + /*
> +  * Find its next-highest non-empty parent, (top cpuset
> +  * has online cpus, so can't be empty).
> +  */
> + parent = c->parent;
> + while (parent && cpus_empty(parent->cpus_allowed))
> + parent = parent->parent;
> + npids = atomic_read(>count);
> + /* c->count is the number of tasks using the cpuset */
> + if (npids)
> + /* move member tasks to the parent cpuset */
> + move_member_tasks_to_cpuset(c, parent);
> +
> + /*
> +  * sanity check that we're not over-running
> +  * the array
> +  */
> + if (++ple_used > ple_count)
> + return;
> + ple = ple_array[(*ple_availp)++];
> + path = (char *)ple + sizeof(struct path_list_element);
> + if (cpuset_path(c, path,
> + PAGE_SIZE-sizeof(struct path_list_element)) < 0)
> + path = NULL;
> + if (path != NULL) {
> + /*
> +  * add path to list of cpusets to remove
> +  * (list includes cpusets that are not
> +  *  notify-on-release)
> +  */
> + ple->path = path;
> + ple->cs = c;
> + /*
> +

Re: "reboot" swsusp mode leaves moon icon blinking

2007-03-20 Thread Johannes Weiner
Hi,

On Tue, Mar 20, 2007 at 07:01:21PM +0100, Pavel Machek wrote:
> > What does the "reboot" swsusp mean?
> 
> echo reboot > ...disk.

Thank you.

> > I am having this (or a similar problem):
> > 
> > echo shutdown > /sys/power/disk; echo disk > /sys/power/state
> 
> Yep, that's duplicate. Try patch below.

I applied it and the box is suspending/resuming properly again.

=Hannes
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] [RFC] AF_RXRPC socket family implementation [try #3]

2007-03-20 Thread David Howells
Alan Cox <[EMAIL PROTECTED]> wrote:

> >  (*) SOCK_RPC has been removed.  AF_RXRPC sockets now simply ignore the
> >  "type" argument to socket().
>
> This is also incorrect

Sigh.

And what would you have me do?  There *isn't* an appropriate SOCK_xxx constant
available, and you won't let me add one that is.  Maybe I should just pick
SOCK_DCCP and have done with it; it's as appropriate as DGRAM, RDM, SEQPACKET
or STREAM - except that would be silly.  I assert that RAW and PACKET are both
even less appropriate than any of the other choices.


Let me explain again why I think each choice is incorrect.  You gave me four
choices, which you and POSIX classify thus:

ConstantYour Service type   POSIX Service Type
=== === ===
SOCK_DGRAM  DatagramDatagram
SOCK_RDMDatagramDatagram
SOCK_SEQPACKET  DatagramStream (maybe Datagram)
SOCK_STREAM Stream  Stream

A datagram service by definition (as can be found on various websites),
transfers a piece of data from one place to another with no dependence on and
no regard to any other pieces of data that the service is asked to transport.
At its simplest level (SOCK_DGRAM), that's _all_ it does.  SOCK_DGRAM makes no
assertions about whether the datagram will get there, and requires no report
that it did get there.  Furthermore, no ordering at all is imposed on the
sequence in which the far side sees any such pieces of data.

SOCK_RDM is a step up from that.  Again, like SOCK_DGRAM, it presents a
datagram service to the application.  But unlike SOCK_DGRAM, it will attempt to
report the success or failure of the attempt to transfer the data.  This may
involve exchanging further packets with the peer behind the scenes, but
ultimately, all it does is to transfer one piece of data from one peer to
another; the added value is that the sender can attempt to determine whether
this worked.  Furthermore, as for SOCK_DGRAM, no ordering at all is imposed on
the sequence in which the far side sees any such pieces of data.

SOCK_SEQPACKET can be considered a step up from SOCK_RDM.  It can be considered
to provide a reliable datagram service to the application (as SOCK_RDM), but
one in which the datagrams are guaranteed to be seen by the receiver in
precisely the same order as they are sent by the sender, with no datagrams
being lost from the sequence.  In this model, SOCK_SEQPACKET would be seen as
providing two independent, independently ordered streams of datagrams, one in
each direction.

SOCK_STREAM is a data streaming service in which data is guaranteed to come out
of the receiver in precisely the same order as it was put into the transmitter.
Furthermore, SOCK_STREAM can be seen as providing two independent,
independently ordered streams of data, one in each direction.

SOCK_SEQPACKET can also be considered to provide a streaming service (similar
to SOCK_STREAM) in which record boundaries are maintained.  Data is guaranteed
to come out of the receiver in precisely the order as it was put in to the
transmitter, but the receiver will break off and flag MSG_EOR at points in the
data flow that correspond to those at which the sender flagged a record
boundary.  In this model, SOCK_SEQPACKET would be seen as providing two
independent, independently ordered streams of data and record markers, one in
each direction.


In effect, SOCK_STREAM and SOCK_SEQPACKET can each be viewed as a pair of
independent, symmetric unidirectional services, one for each direction:

++   ++
||  +-+  ||
||->|  Tx Stream  |->||
| Local  |  +-+  | Remote |
| Socket |   | Socket |
   _||___||_
||   ||
||  +-+  ||
||<-|  Rx Stream  |<-||
||  +-+  ||
++   ++

SOCK_SEQPACKET can give the appearance of being an ordered, reliable datagram
service simply by the application using it assuming that the record boundaries
delimit separate datagrams.

SOCK_DGRAM and SOCK_RDM, on the other hand, can be viewed as being a pair of
independent unidirectional asymmetric services, one that spits out packets and
one that collects datagrams.

++ +--+   ++
|| |Tx|-- ||
|| +--+  \||
||/   \   ||
||   /   +--+  \  ||
||->+--->|Tx|>||
| Local  |   \   

Re: [RFC] SLAB : NUMA cache_free_alien() very expensive because of virt_to_slab(objp); nodeid = slabp->nodeid;

2007-03-20 Thread Eric Dumazet

Andi Kleen a écrit :

Is it possible virt_to_slab(objp)->nodeid being different from pfn_to_nid(objp) 
?
It is possible the page allocator falls back to another node than 
requested. We would need to check that this never occurs.


The only way to ensure that would be to set a strict mempolicy.
But I'm not sure that's a good idea -- after all you don't want
to fail an allocation in this case.

But pfn_to_nid on the object like proposed by Eric should work anyways.
But I'm not sure the tables used for that will be more often cache hot
than the slab.


pfn_to_nid() on most x86_64 machines access one cache line (struct memnode).

Node 0 MemBase  Limit 00028000
Node 1 MemBase 00028000 Limit 00048000
NUMA: Using 31 for the hash shift.

On this example, we use only 8 bytes of memnode.embedded_map[] to find nid of 
all 16 GB of ram. On profiles I have, memnode is always hot (no cache miss on it).


While virt_to_slab() has to access :

1) struct page -> page_get_slab() (page->lru.prev) (one cache miss)
2) struct slab -> nodeid (one other cache miss)


So using pfn_to_nid() would avoid 2 cache misses.

I understand we want to do special things (fallback and such tricks) at 
allocation time, but I believe that we can just trust the real nid of memory 
at free time.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: "reboot" swsusp mode leaves moon icon blinking

2007-03-20 Thread Pavel Machek
Hi!

> > Yes, we could do that.
> > 
> > OTOH, we could simply avoid calling the platform code in resume
> > path. It worked ok for a long while, and it seems to have no
> > downsides...
> 
> No, we have always done it, actually, but not in this particular place. ;-)
> 
> In theory, avoiding it could be problematic, because of the pm_ops->finish()
> that gets called after the image has been restored (if the platform mode was
> used for suspending).  Still, I think we can try.

If we can return to the method "how we did it in 2.6.20", that would
indeed be preffered ;-).
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] Replace pid_t in autofs with struct pid reference

2007-03-20 Thread Eric W. Biederman
"Serge E. Hallyn" <[EMAIL PROTECTED]> writes:

>> >  void autofs4_dentry_release(struct dentry *);
>> >  extern void autofs4_kill_sb(struct super_block *);
>> > diff --git a/fs/autofs4/waitq.c b/fs/autofs4/waitq.c
>> > index 9857543..4a9ad9b 100644
>> > --- a/fs/autofs4/waitq.c
>> > +++ b/fs/autofs4/waitq.c
>> > @@ -141,8 +141,8 @@ static void autofs4_notify_daemon(struct
>> >packet->ino = wq->ino;
>> >packet->uid = wq->uid;
>> >packet->gid = wq->gid;
>> > -  packet->pid = wq->pid;
>> > -  packet->tgid = wq->tgid;
>> > +  packet->pid = pid_nr(wq->pid);
>> > +  packet->tgid = pid_nr(wq->tgid);
>> >break;
>> 
>> I'm assuming we build the packet in the process context of the
>> daemon we are sending it to.  If not we have a problem here.
>
> Yes this is data being sent to a userspace daemon (Ian pls correct me if
> I'm wrong) so the pid_nr is the only thing we can send.

Agreed.  The question is are we in the user space daemon's process when
we generate the pid_nr.  Or do we stuff this in some kind of socket,
and the socket switch locations of the packet.

Basically I'm just trying to be certain we are calling pid_nr in the
proper context.  Otherwise we could get the wrong pid when we have
multiple pid namespaces in play.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   >