The cpuset mems_allowed update code in alloc_pages_current could (in theory) put a task to sleep that didn't allow sleeping (did not have __GFP_WAIT flag set). In the rare circumstance that the current tasks mems_generation is outofdate compared to the tasks cpuset mems_generation, this mems_allowed update code needs to grap cpuset_sem, which can sleep.
We avoid this by not trying to update mems_allowed here if we can't sleep (__GFP_WAIT not set). Applies to top of Linus's bk tree (post 2.6.11) Thanks to Ray Bryant <[EMAIL PROTECTED]> for noticing this. Signed-off-by: Paul Jackson <[EMAIL PROTECTED]> =================================================================== --- 2.6.12-pj.orig/mm/mempolicy.c 2005-03-16 01:16:58.000000000 -0800 +++ 2.6.12-pj/mm/mempolicy.c 2005-03-16 01:32:05.000000000 -0800 @@ -788,12 +788,16 @@ alloc_page_vma(unsigned gfp, struct vm_a * Allocate a page from the kernel page pool. When not in * interrupt context and apply the current process NUMA policy. * Returns NULL when no page can be allocated. + * + * Don't call cpuset_update_current_mems_allowed() unless + * 1) it's ok to take cpuset_sem (can WAIT), and + * 2) allocating for current task (not interrupt). */ struct page *alloc_pages_current(unsigned gfp, unsigned order) { struct mempolicy *pol = current->mempolicy; - if (!in_interrupt()) + if ((gfp & __GFP_WAIT) && !in_interrupt()) cpuset_update_current_mems_allowed(); if (!pol || in_interrupt()) pol = &default_policy; -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/