> Jeff Breidenbach:
> > However, I suggest understanding the problem better before doing
> > serious work. Let me know if I can be of help.
> 
> I see.
> To be honest, I am unsure that aufs VDIR is the real cause of your
> problem though I am sure it is suspicious.
> I thought the 'readdir in userspace' approach would be useful for those
> who have huge number of files under a single dir, and it wouln't do harm
> even it is implemented.
> While I have already started implementing, I will postpone it. Instead I
> may ask you several tests on your system.

I may find the cause.
What I felt strange is that your kernel messages says
"page allocation failure. order:N ..." while the Nth free_area still has
some memory.
I think there are two issues related to this problem.
One is a kernel bug which was recently fixed (see below), and the other
is that aufs VDIR consumes much memory.

I tried reprocuding the problem and wrote a small kernel module to cause
memory fragmentation. It was not easy and sure, but I think I could.
And backporting the recent fix in mainline into ubuntu hardy kernel
(with some minor changes), I think I could solve the problem.

With this patch, most of your problem will be solved. But it won't fix
all cases, since aufs VDIR still consumes much memory. In the real
memory starvation or fragmented situation, the message will appear again.
I will restart developing VDIR implementation in userspace which stops
aufs to consume much memory. It may take a few weeks. Until then, please
try this patch against ubuntu hardy kernel.


J. R. Okajima

----------------------------------------------------------------------

(The recent fix in mainline)
commit fa5e084e43eb14c14942027e1e2e894aeed96097
Author: Mel Gorman <m...@csn.ul.ie>
Date:   Tue Jun 16 15:33:22 2009 -0700

    vmscan: do not unconditionally treat zones that fail zone_reclaim() as full
    
    On NUMA machines, the administrator can configure zone_reclaim_mode that
    is a more targetted form of direct reclaim.  On machines with large NUMA
    distances for example, a zone_reclaim_mode defaults to 1 meaning that
    clean unmapped pages will be reclaimed if the zone watermarks are not
    being met.  The problem is that zone_reclaim() failing at all means the
    zone gets marked full.
    
    This can cause situations where a zone is usable, but is being skipped
    because it has been considered full.  Take a situation where a large tmpfs
    mount is occuping a large percentage of memory overall.  The pages do not
    get cleaned or reclaimed by zone_reclaim(), but the zone gets marked full
    and the zonelist cache considers them not worth trying in the future.
    
    This patch makes zone_reclaim() return more fine-grained information about
    what occured when zone_reclaim() failued.  The zone only gets marked full
    if it really is unreclaimable.  If it's a case that the scan did not occur
    or if enough pages were not reclaimed with the limited reclaim_mode, then
    the zone is simply skipped.
    
    There is a side-effect to this patch.  Currently, if zone_reclaim()
    successfully reclaimed SWAP_CLUSTER_MAX, an allocation attempt would go
    ahead.  With this patch applied, zone watermarks are rechecked after
    zone_reclaim() does some work.
    
    This bug was introduced by commit 9276b1bc96a132f4068fdee00983c532f43d3a26
    ("memory page_alloc zonelist caching speedup") way back in 2.6.19 when the
    zonelist_cache was introduced.  It was not intended that zone_reclaim()
    aggressively consider the zone to be full when it failed as full direct
    reclaim can still be an option.  Due to the age of the bug, it should be
    considered a -stable candidate.
    
    Signed-off-by: Mel Gorman <m...@csn.ul.ie>
    Reviewed-by: Wu Fengguang <fengguang...@intel.com>
    Reviewed-by: Rik van Riel <r...@redhat.com>
    Reviewed-by: KOSAKI Motohiro <kosaki.motoh...@jp.fujitsu.com>
    Cc: Christoph Lameter <c...@linux-foundation.org>
    Cc: <sta...@kernel.org>
    Signed-off-by: Andrew Morton <a...@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torva...@linux-foundation.org>

Attachment: a.patch.bz2
Description: BZip2 compressed data

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july

Reply via email to