RE: kacpi_notify?
On Thu, 13 Jul 2006, Starikovskiy, Alexey Y wrote: I'm terribly sorry that my patch broke on your machine. May I ask you to send me or attach to #5534 output of acpidump from this machine? I'll send it in another email, since I already generated it for Len ;) Do you think that the whole idea is crap, or if I limit number of possible spawned threads and forsibly put current thread to sleep (which will release ACPICA executer mutex), as it happens in DSDT of nx6125 it will be possible to use it? I don't think the _idea_ is crap per se, but it would at a minimum need a thread limit. But I think it's the wrong approach: especially if you put the current thread to sleep, you really don't want another thread at all, you are really just working around a problem that is totally internal to acpi (and the AML interpreter in particular). So I think the problem really lies elsewhere, and that the whole thread approach was trying to paper over it. And having a limited set of threads is probably potentially _worse_ then what we have now. Is there no way to have the AML interpreter have some state, and just push that current interrupted state back onto the event queue, and just start executing the new one instead? That sounds like it should fix the _real_ problem - a kind of mini-scheduler for ACPI events? Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
kacpi_notify?
Hmm. What's up with this? 2341 ?D 0:00 [kacpid_notify] 2342 ?D 0:00 [kacpid_notify] 2343 ?D 0:00 [kacpid_notify] 2344 ?D 0:00 [kacpid_notify] 2345 ?D 0:00 [kacpid_notify] 2346 ?D 0:00 [kacpid_notify] 2347 ?D 0:00 [kacpid_notify] ... (apparently about 300 of those processes, at which point the machine just hangs, because even root cannot start any new processes, and I couldn't actually get to debug this at all). What would it be waiting on, and why? This machine doesn't have any module support (at all), and I haven't booted a new kernel on it in quite a while, so this isn't necessarily new behaviour, but the last kernel I tried (which did _not_ have this problem, obviously) was in April (commit 6e5882cfa24e1456702e463f6920fc0ca3c3d2b8, to be exact). Now, that's 6000+ commits ago, so I'd rather not even bisect this, if somebody can come up with a more obvious explanation of why kacpid_notify would be started over and over and over again, only to always get stuck.. Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kacpi_notify?
On Wed, 12 Jul 2006, Linus Torvalds wrote: (apparently about 300 of those processes, at which point the machine just hangs, because even root cannot start any new processes, and I couldn't actually get to debug this at all). With ACPI debugging, I notice that it finally dies due to ACPI Error AE_NO_MEMORY. Which I guess is just due to thousands of kacpi_notify processes, and tons of allocations. With ctrl+scrolllock, I finally got something. The traceback for the D-state (millions and millions of them) is __down_failed acpi_ut_acquire_mutex acpi_ex_enter_interpreter acpi_ns_evaluate acpi_evaluate_object acpi_evaluate_integer acpi_os_execute_thread acpi_thermal_get_temperature acpi_thermal_check .. and 'kacpid' seems to be stuck using all CPU time, with the thing doing something like: EIP is at delay_tsc+0xb/0x13 EFLAGS: 0283Not tainted (2.6.18-rc1-g155dbfd8 #24) EAX: 4aa48900 EBX: 00026be1 ECX: 4aa40b7e EDX: 001a ESI: EDI: c039300d EBP: c0390df3 DS: 007b ES: 007b CR0: 8005003b CR2: 080516f0 CR3: 362dc000 CR4: 06d0 [c01c94c0] __delay+0x6/0x7 [c01f23ef] acpi_os_stall+0x1d/0x29 [c0201f11] acpi_ex_system_do_stall+0x37/0x3b [c0200fca] acpi_ex_opcode_1A_0T_0R+0x85/0xc8 [c01f5308] acpi_ds_exec_end_op+0x133/0x553 [c020d0f3] acpi_ps_parse_loop+0x777/0xbe0 [c020c488] acpi_ps_parse_aml+0xd8/0x2d5 [c020dbbe] acpi_ps_execute_pass+0xa9/0xd2 [c020dd6a] acpi_ps_execute_method+0x153/0x231 [c02095e1] acpi_ns_evaluate+0x179/0x24c [c01fc12e] acpi_ev_asynch_execute_gpe_method+0xeb/0x159 [c01f2083] acpi_os_execute_deferred+0x19/0x21 [c01226a0] run_workqueue+0x68/0x95 [c01f206a] acpi_os_execute_deferred+0x0/0x21 [c0122b2e] worker_thread+0xf9/0x12b [c03570bf] schedule+0x469/0x4cc [c0113bfb] default_wake_function+0x0/0xc [c0122a35] worker_thread+0x0/0x12b [c01249bb] kthread+0xad/0xd8 [c012490e] kthread+0x0/0xd8 [c0101005] kernel_thread_helper+0x5/0xb which I assume is the thing that holds the AML semaphore, and isn't releasing it. Is there any sane debugging info I can send people? Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: kacpi_notify?
On Wed, 12 Jul 2006, Linus Torvalds wrote: I've got a hundred-odd commits to go, but the next bisection test happens to be the parent of my merge (your merge linus into release branch merge: ae6c859b7dcd708efadf1c76279c33db213e3506), so if I'm right, I'd expect that to be a bad tree. Yup. And yes, the problem seems to co-incide with getting about 300 acpi interrupts per second. After about 9500 interrupts (each of which seems to create one of these things), the machine is basically dead. Ten thousand kacpid_notify threads is too much. Regardless of what brought on this bug, I think there's something wrong in anything that keeps on notifying things without keeping track of how many outstanding notifications it already has. Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: kacpi_notify?
Likely related to bugzilla-5534 -Len - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: kacpi_notify?
On Wed, 12 Jul 2006, Brown, Len wrote: Likely related to bugzilla-5534 b8d35192c55fb055792ff0641408eaaec7c88988 Well, that one certainly looks likely. Any reason to not just revert it? The fundamental problems that it introduces are obviously much worse than the fix. Linus - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: kacpi_notify?
On Wed, 12 Jul 2006, Brown, Len wrote: Likely related to bugzilla-5534 b8d35192c55fb055792ff0641408eaaec7c88988 Well, that one certainly looks likely. Any reason to not just revert it? The fundamental problems that it introduces are obviously much worse than the fix. If reverting it fixes your EVO, then certainly this is what to do right now. However, as the saga in 5534 will testify, this will make other systems unhappy, so we'll have to come back quickly with an improved patch for 5534. -Len - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: kacpi_notify?
Here's a suggested revert. Please try this smaller revert to just osl.c. (it builds and boots for me) It reverts acpi_os_queue_for_execution() to exactly as it was in 2.6.17, except it changes the name to acpi_os_execute() to match ACPICA 20060512. (yes, it is okay we ignore the 1st parameter, it wasn't used until the 5534 fix we are reverting) thanks, -Len Signed-off-by: Len Brown [EMAIL PROTECTED] diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c diff --git a/drivers/acpi/events/evgpe.c b/drivers/acpi/events/evgpe.c diff --git a/drivers/acpi/events/evmisc.c b/drivers/acpi/events/evmisc.c diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c index 47dfde9..b7d1514 100644 --- a/drivers/acpi/osl.c +++ b/drivers/acpi/osl.c @@ -36,7 +36,6 @@ #include linux/kmod.h #include linux/delay.h #include linux/workqueue.h #include linux/nmi.h -#include linux/kthread.h #include acpi/acpi.h #include asm/io.h #include acpi/acpi_bus.h @@ -583,16 +582,6 @@ static void acpi_os_execute_deferred(voi return; } -static int acpi_os_execute_thread(void *context) -{ - struct acpi_os_dpc *dpc = (struct acpi_os_dpc *)context; - if (dpc) { - dpc-function(dpc-context); - kfree(dpc); - } - do_exit(0); -} - /*** * * FUNCTION:acpi_os_execute @@ -614,10 +603,16 @@ acpi_status acpi_os_execute(acpi_execute acpi_status status = AE_OK; struct acpi_os_dpc *dpc; struct work_struct *task; - struct task_struct *p; + + ACPI_FUNCTION_TRACE(os_queue_for_execution); + + ACPI_DEBUG_PRINT((ACPI_DB_EXEC, + Scheduling function [%p(%p)] for deferred execution.\n, + function, context)); if (!function) - return AE_BAD_PARAMETER; + return_ACPI_STATUS(AE_BAD_PARAMETER); + /* * Allocate/initialize DPC structure. Note that this memory will be * freed by the callee. The kernel handles the tq_struct list in a @@ -628,34 +623,27 @@ acpi_status acpi_os_execute(acpi_execute * We can save time and code by allocating the DPC and tq_structs * from the same memory. */ - if (type == OSL_NOTIFY_HANDLER) { - dpc = kmalloc(sizeof(struct acpi_os_dpc), GFP_KERNEL); - } else { - dpc = kmalloc(sizeof(struct acpi_os_dpc) + - sizeof(struct work_struct), GFP_ATOMIC); - } + + dpc = + kmalloc(sizeof(struct acpi_os_dpc) + sizeof(struct work_struct), + GFP_ATOMIC); if (!dpc) - return AE_NO_MEMORY; + return_ACPI_STATUS(AE_NO_MEMORY); + dpc-function = function; dpc-context = context; - if (type == OSL_NOTIFY_HANDLER) { - p = kthread_create(acpi_os_execute_thread, dpc, kacpid_notify); - if (!IS_ERR(p)) { - wake_up_process(p); - } else { - status = AE_NO_MEMORY; - kfree(dpc); - } - } else { - task = (void *)(dpc + 1); - INIT_WORK(task, acpi_os_execute_deferred, (void *)dpc); - if (!queue_work(kacpid_wq, task)) { - status = AE_ERROR; - kfree(dpc); - } + task = (void *)(dpc + 1); + INIT_WORK(task, acpi_os_execute_deferred, (void *)dpc); + + if (!queue_work(kacpid_wq, task)) { + ACPI_DEBUG_PRINT((ACPI_DB_ERROR, + Call to queue_work() failed.\n)); + kfree(dpc); + status = AE_ERROR; } - return status; + + return_ACPI_STATUS(status); } EXPORT_SYMBOL(acpi_os_execute); - To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html