RE: kacpi_notify?

2006-07-13 Thread Linus Torvalds


On Thu, 13 Jul 2006, Starikovskiy, Alexey Y wrote:
 
 I'm terribly sorry that my patch broke on your machine.
 May I ask you to send me or attach to #5534 output of acpidump from this
 machine?

I'll send it in another email, since I already generated it for Len ;)

 Do you think that the whole idea is crap, or if I limit number of
 possible spawned threads and forsibly put current thread to sleep (which
 will release ACPICA executer mutex), as it happens in DSDT of nx6125 it
 will be possible to use it?

I don't think the _idea_ is crap per se, but it would at a minimum need a 
thread limit. But I think it's the wrong approach: especially if you put 
the current thread to sleep, you really don't want another thread at all, 
you are really just working around a problem that is totally internal to 
acpi (and the AML interpreter in particular).

So I think the problem really lies elsewhere, and that the whole thread 
approach was trying to paper over it. And having a limited set of threads 
is probably potentially _worse_ then what we have now.

Is there no way to have the AML interpreter have some state, and just push 
that current interrupted state back onto the event queue, and just start 
executing the new one instead? That sounds like it should fix the _real_ 
problem - a kind of mini-scheduler for ACPI events?

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kacpi_notify?

2006-07-12 Thread Linus Torvalds

Hmm.

What's up with this?

 2341 ?D 0:00 [kacpid_notify]
 2342 ?D 0:00 [kacpid_notify]
 2343 ?D 0:00 [kacpid_notify]
 2344 ?D 0:00 [kacpid_notify]
 2345 ?D 0:00 [kacpid_notify]
 2346 ?D 0:00 [kacpid_notify]
 2347 ?D 0:00 [kacpid_notify]
 ...

(apparently about 300 of those processes, at which point the machine just 
hangs, because even root cannot start any new processes, and I couldn't 
actually get to debug this at all).

What would it be waiting on, and why?

This machine doesn't have any module support (at all), and I haven't 
booted a new kernel on it in quite a while, so this isn't necessarily new 
behaviour, but the last kernel I tried (which did _not_ have this problem, 
obviously) was in April (commit 6e5882cfa24e1456702e463f6920fc0ca3c3d2b8, 
to be exact).

Now, that's 6000+ commits ago, so I'd rather not even bisect this, if 
somebody can come up with a more obvious explanation of why kacpid_notify 
would be started over and over and over again, only to always get stuck..

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kacpi_notify?

2006-07-12 Thread Linus Torvalds


On Wed, 12 Jul 2006, Linus Torvalds wrote:
 
 (apparently about 300 of those processes, at which point the machine just 
 hangs, because even root cannot start any new processes, and I couldn't 
 actually get to debug this at all).

With ACPI debugging, I notice that it finally dies due to ACPI Error 
AE_NO_MEMORY. Which I guess is just due to thousands of kacpi_notify 
processes, and tons of allocations.

With ctrl+scrolllock, I finally got something. The traceback for the 
D-state (millions and millions of them) is

__down_failed
acpi_ut_acquire_mutex
acpi_ex_enter_interpreter
acpi_ns_evaluate
acpi_evaluate_object
acpi_evaluate_integer
acpi_os_execute_thread
acpi_thermal_get_temperature
acpi_thermal_check
..

and 'kacpid' seems to be stuck using all CPU time, with the thing doing 
something like:

EIP is at delay_tsc+0xb/0x13
 EFLAGS: 0283Not tainted  (2.6.18-rc1-g155dbfd8 #24)
EAX: 4aa48900 EBX: 00026be1 ECX: 4aa40b7e EDX: 001a
ESI:  EDI: c039300d EBP: c0390df3 DS: 007b ES: 007b
CR0: 8005003b CR2: 080516f0 CR3: 362dc000 CR4: 06d0
 [c01c94c0] __delay+0x6/0x7
 [c01f23ef] acpi_os_stall+0x1d/0x29
 [c0201f11] acpi_ex_system_do_stall+0x37/0x3b
 [c0200fca] acpi_ex_opcode_1A_0T_0R+0x85/0xc8
 [c01f5308] acpi_ds_exec_end_op+0x133/0x553
 [c020d0f3] acpi_ps_parse_loop+0x777/0xbe0
 [c020c488] acpi_ps_parse_aml+0xd8/0x2d5
 [c020dbbe] acpi_ps_execute_pass+0xa9/0xd2
 [c020dd6a] acpi_ps_execute_method+0x153/0x231
 [c02095e1] acpi_ns_evaluate+0x179/0x24c
 [c01fc12e] acpi_ev_asynch_execute_gpe_method+0xeb/0x159
 [c01f2083] acpi_os_execute_deferred+0x19/0x21
 [c01226a0] run_workqueue+0x68/0x95
 [c01f206a] acpi_os_execute_deferred+0x0/0x21
 [c0122b2e] worker_thread+0xf9/0x12b
 [c03570bf] schedule+0x469/0x4cc
 [c0113bfb] default_wake_function+0x0/0xc
 [c0122a35] worker_thread+0x0/0x12b
 [c01249bb] kthread+0xad/0xd8
 [c012490e] kthread+0x0/0xd8
 [c0101005] kernel_thread_helper+0x5/0xb

which I assume is the thing that holds the AML semaphore, and isn't 
releasing it.

Is there any sane debugging info I can send people?

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: kacpi_notify?

2006-07-12 Thread Linus Torvalds


On Wed, 12 Jul 2006, Linus Torvalds wrote:
 
 I've got a hundred-odd commits to go, but the next bisection test happens 
 to be the parent of my merge (your merge linus into release branch 
 merge: ae6c859b7dcd708efadf1c76279c33db213e3506), so if I'm right, I'd 
 expect that to be a bad tree.

Yup.

And yes, the problem seems to co-incide with getting about 300 acpi 
interrupts per second. After about 9500 interrupts (each of which seems to 
create one of these things), the machine is basically dead.

Ten thousand kacpid_notify threads is too much. Regardless of what brought 
on this bug, I think there's something wrong in anything that keeps on 
notifying things without keeping track of how many outstanding 
notifications it already has.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: kacpi_notify?

2006-07-12 Thread Brown, Len
Likely related to bugzilla-5534

-Len 
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: kacpi_notify?

2006-07-12 Thread Linus Torvalds


On Wed, 12 Jul 2006, Brown, Len wrote:

 Likely related to bugzilla-5534
 
 b8d35192c55fb055792ff0641408eaaec7c88988

Well, that one certainly looks likely.

Any reason to not just revert it? The fundamental problems that it 
introduces are obviously much worse than the fix.

Linus 
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: kacpi_notify?

2006-07-12 Thread Brown, Len

On Wed, 12 Jul 2006, Brown, Len wrote:

 Likely related to bugzilla-5534
 
 b8d35192c55fb055792ff0641408eaaec7c88988

Well, that one certainly looks likely.

Any reason to not just revert it? The fundamental problems that it 
introduces are obviously much worse than the fix.

If reverting it fixes your EVO, then certainly this is what to do right
now.
However, as the saga in 5534 will testify, this will make other systems
unhappy,
so we'll have to come back quickly with an improved patch for 5534.

-Len
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: kacpi_notify?

2006-07-12 Thread Len Brown
Here's a suggested revert.

Please try this smaller revert to just osl.c.
(it builds and boots for me)

It reverts acpi_os_queue_for_execution() to exactly
as it was in 2.6.17, except it changes the name to
acpi_os_execute() to match ACPICA 20060512.

(yes, it is okay we ignore the 1st parameter,
 it wasn't used until the 5534 fix we are reverting)

thanks,
-Len

Signed-off-by: Len Brown [EMAIL PROTECTED]

diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
diff --git a/drivers/acpi/events/evgpe.c b/drivers/acpi/events/evgpe.c
diff --git a/drivers/acpi/events/evmisc.c b/drivers/acpi/events/evmisc.c
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 47dfde9..b7d1514 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -36,7 +36,6 @@ #include linux/kmod.h
 #include linux/delay.h
 #include linux/workqueue.h
 #include linux/nmi.h
-#include linux/kthread.h
 #include acpi/acpi.h
 #include asm/io.h
 #include acpi/acpi_bus.h
@@ -583,16 +582,6 @@ static void acpi_os_execute_deferred(voi
return;
 }
 
-static int acpi_os_execute_thread(void *context)
-{
-   struct acpi_os_dpc *dpc = (struct acpi_os_dpc *)context;
-   if (dpc) {
-   dpc-function(dpc-context);
-   kfree(dpc);
-   }
-   do_exit(0);
-}
-
 
/***
  *
  * FUNCTION:acpi_os_execute
@@ -614,10 +603,16 @@ acpi_status acpi_os_execute(acpi_execute
acpi_status status = AE_OK;
struct acpi_os_dpc *dpc;
struct work_struct *task;
-   struct task_struct *p;
+
+   ACPI_FUNCTION_TRACE(os_queue_for_execution);
+
+   ACPI_DEBUG_PRINT((ACPI_DB_EXEC,
+ Scheduling function [%p(%p)] for deferred 
execution.\n,
+ function, context));
 
if (!function)
-   return AE_BAD_PARAMETER;
+   return_ACPI_STATUS(AE_BAD_PARAMETER);
+
/*
 * Allocate/initialize DPC structure.  Note that this memory will be
 * freed by the callee.  The kernel handles the tq_struct list  in a
@@ -628,34 +623,27 @@ acpi_status acpi_os_execute(acpi_execute
 * We can save time and code by allocating the DPC and tq_structs
 * from the same memory.
 */
-   if (type == OSL_NOTIFY_HANDLER) {
-   dpc = kmalloc(sizeof(struct acpi_os_dpc), GFP_KERNEL);
-   } else {
-   dpc = kmalloc(sizeof(struct acpi_os_dpc) +
-   sizeof(struct work_struct), GFP_ATOMIC);
-   }
+
+   dpc =
+   kmalloc(sizeof(struct acpi_os_dpc) + sizeof(struct work_struct),
+   GFP_ATOMIC);
if (!dpc)
-   return AE_NO_MEMORY;
+   return_ACPI_STATUS(AE_NO_MEMORY);
+
dpc-function = function;
dpc-context = context;
 
-   if (type == OSL_NOTIFY_HANDLER) {
-   p = kthread_create(acpi_os_execute_thread, dpc, 
kacpid_notify);
-   if (!IS_ERR(p)) {
-   wake_up_process(p);
-   } else {
-   status = AE_NO_MEMORY;
-   kfree(dpc);
-   }
-   } else {
-   task = (void *)(dpc + 1);
-   INIT_WORK(task, acpi_os_execute_deferred, (void *)dpc);
-   if (!queue_work(kacpid_wq, task)) {
-   status = AE_ERROR;
-   kfree(dpc);
-   }
+   task = (void *)(dpc + 1);
+   INIT_WORK(task, acpi_os_execute_deferred, (void *)dpc);
+
+   if (!queue_work(kacpid_wq, task)) {
+   ACPI_DEBUG_PRINT((ACPI_DB_ERROR,
+ Call to queue_work() failed.\n));
+   kfree(dpc);
+   status = AE_ERROR;
}
-   return status;
+
+   return_ACPI_STATUS(status);
 }
 
 EXPORT_SYMBOL(acpi_os_execute);
-
To unsubscribe from this list: send the line unsubscribe linux-acpi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html