On Tue, 27 May 2014, Marcelo Tosatti wrote:

> >
> > Memory policies are only applied to a specific zone so this is not
> > unprecedented. However, if a user wants to limit allocation to a specific
> > node and there is no DMA memory there then may be that is a operator
> > error? After all the application will be using memory from a node that the
> > operator explicitly wanted not to be used.
> Ok here is the use-case:
> - machine contains driver which requires zone specific memory (such as
> KVM, which requires root pagetable at paddr < 4GB).

GFP_KERNEL is used for page tables.

>  * The second pass through get_page_from_freelist() doesn't even call
>  * here for GFP_ATOMIC calls.  For those calls, the __alloc_pages()
>  * variable 'wait' is not set, and the bit ALLOC_CPUSET is not set
>  * in alloc_flags.  That logic and the checks below have the combined
>  * affect that:
>  *      in_interrupt - any node ok (current task context irrelevant)
>  *      GFP_ATOMIC   - any node ok
>  *      TIF_MEMDIE   - any node ok
>  *      GFP_KERNEL   - any node in enclosing hardwalled cpuset ok

Page table allocations are GFP_KERNEL allocations. So the above use case
is ok if you switch off the hardwall flag in the cpuset.

