Hi Dave,

On Wed, Jun 16, 2010 at 08:57:09AM +1000, Dave Airlie wrote:
> > (switching back to email, actually)
> > 
> > On Sun, 13 Jun 2010 13:01:57 GMT
> > bugzilla-daemon at bugzilla.kernel.org wrote:
> > 
> > > https://bugzilla.kernel.org/show_bug.cgi?id=16148
> > >
> > >
> > > Mikko C. <mikko.cal at gmail.com> changed:
> > >
> > >            What    |Removed                     |Added
> > > ----------------------------------------------------------------------------
> > >                  CC|                            |mikko.cal at gmail.com
> > >
> > >
> > >
> > >
> > > --- Comment #8 from Mikko C. <mikko.cal at gmail.com>  2010-06-13 
> > > 13:01:53 ---
> > > I have been getting this with 2.6.35-rc2 and rc3.
> > > Could it be the same problem?
> > >
> > >
> > > X: page allocation failure. order:0, mode:0x4
> > > Pid: 1514, comm: X Not tainted 2.6.35-rc3 #1
> > > Call Trace:
> > >  [<ffffffff8108ce49>] ? __alloc_pages_nodemask+0x629/0x680
> > >  [<ffffffff8108c920>] ? __alloc_pages_nodemask+0x100/0x680
> > >  [<ffffffffa00db8f3>] ? ttm_get_pages+0x2c3/0x448 [ttm]
> > >  [<ffffffffa00d4658>] ? __ttm_tt_get_page+0x98/0xc0 [ttm]
> > >  [<ffffffffa00d4988>] ? ttm_tt_populate+0x48/0x90 [ttm]
> > >  [<ffffffffa00d4a26>] ? ttm_tt_bind+0x56/0xa0 [ttm]
> > >  [<ffffffffa00d5230>] ? ttm_bo_handle_move_mem+0x1d0/0x430 [ttm]
> > >  [<ffffffffa00d76d6>] ? ttm_bo_move_buffer+0x166/0x180 [ttm]
> > >  [<ffffffffa00b9736>] ? drm_mm_kmalloc+0x26/0xc0 [drm]
> > >  [<ffffffff81030ea9>] ? get_parent_ip+0x9/0x20
> > >  [<ffffffffa00d7786>] ? ttm_bo_validate+0x96/0x130 [ttm]
> > >  [<ffffffffa00d7b35>] ? ttm_bo_init+0x315/0x390 [ttm]
> > >  [<ffffffffa0122eb8>] ? radeon_bo_create+0x118/0x210 [radeon]
> > >  [<ffffffffa0122fb0>] ? radeon_ttm_bo_destroy+0x0/0xb0 [radeon]
> > >  [<ffffffffa013704c>] ? radeon_gem_object_create+0x8c/0x110 [radeon]
> > >  [<ffffffffa013711f>] ? radeon_gem_create_ioctl+0x4f/0xe0 [radeon]
> > >  [<ffffffffa00b10e6>] ? drm_ioctl+0x3d6/0x470 [drm]
> > >  [<ffffffffa01370d0>] ? radeon_gem_create_ioctl+0x0/0xe0 [radeon]
> > >  [<ffffffff810b965f>] ? do_sync_read+0xbf/0x100
> > >  [<ffffffff810c8965>] ? vfs_ioctl+0x35/0xd0
> > >  [<ffffffff810c8b28>] ? do_vfs_ioctl+0x88/0x530
> > >  [<ffffffff81031ed7>] ? sub_preempt_count+0x87/0xb0
> > >  [<ffffffff810c9019>] ? sys_ioctl+0x49/0x80
> > >  [<ffffffff810ba4fe>] ? sys_read+0x4e/0x90
> > >  [<ffffffff810024ab>] ? system_call_fastpath+0x16/0x1b
> > 
> > That's different.  ttm_get_pages() looks pretty busted to me.  It's not
> > using __GFP_WAIT and it's not using __GFP_FS.  It's using a plain
> > GFP_DMA32 so it's using atomic allocations even though it doesn't need
> > to.  IOW, it's shooting itself in the head.
> > 

This is an accurate assessment.

> > Given that it will sometimes use GFP_HIGHUSER which includes __GFP_FS
> > and __GFP_WAIT, I assume it can always include __GFP_FS and __GFP_WAIT.
> > If so, it should very much do so.  If not then the function is
> > misdesigned and should be altered to take a gfp_t argument so the
> > caller can tell ttm_get_pages() which is the strongest allocation mode
> > which it may use.
> > 
> > > [TTM] Unable to allocate page.
> > > radeon 0000:01:05.0: object_init failed for (7827456, 0x00000002)
> > > [drm:radeon_gem_object_create] *ERROR* Failed to allocate GEM object 
> > > (7827456,
> > > 2, 4096, -12)
> > 
> > This bug actually broke stuff for you.
> > 
> > Something like this:
> > 
> > --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c~a
> > +++ a/drivers/gpu/drm/ttm/ttm_page_alloc.c
> > @@ -677,7 +677,7 @@ int ttm_get_pages(struct list_head *page
> >     /* No pool for cached pages */
> >     if (pool == NULL) {
> >             if (flags & TTM_PAGE_FLAG_DMA32)
> > -                   gfp_flags |= GFP_DMA32;
> > +                   gfp_flags |= GFP_KERNEL|GFP_DMA32;
> >             else
> >                     gfp_flags |= GFP_HIGHUSER;
> >  
> > _
> > 
> > although I wonder whether it should be using pool->gfp_flags.
> > 
> > 
> > It's a shame that this code was developed and merged in secret :( Had
> > we known, we could have looked at enhancing mempools to cover the
> > requirement, or at implementing this in some generic fashion rather
> > than hiding it down in drivers/gpu/drm.
> 
> Its been post to lkml at least once or twice over the past few years
> though not as much as it was posted to dri-devel, but that was because
> we had never seen anyone show any interest in it outside of kernel
> hackers. Originally I was going to use the generic allocator stuff ia64
> uses for uncached allocations but it allocates memory ranges not pages
> so it wasn't useful. I also suggested getting a page flag for uncached
> allocator stuff, I was told to go write the code in my own corner and
> prove it was required. So I did, and it was cleaned up and worked on by
> others and I merged it. So can we lay off with the "in secret", the
> original code is nearly 2 years old at this point and just because -mm
> hackers choose to ignore it isn't our fault. Patches welcome.
> 

No comment on this aspect of things.

> So now back to the bug:
> 
> So the page pools are setup with gfp flags, in the normal case, 4 pools,
> one WC GFP_HIGHUSER pages, one UC HIGHUSER pages, one WC GFP_USER|
> GFP_DMA32, one UC GFP_USER|GFP_DMA32, so the pools are all fine, the
> problem here is the same as before we added the pools, which is the
> normal page allocation path, which needs the GFP_USER added instead of
> GFP_KERNEL.
> 
> That said I've noticed a lot more page allocation failure reports in
> 2.6.35-rcX than we've gotten for a long time, in code that hasn't
> changed (the AGP ones the other day for example) has something in the
> core MM regressed (again... didn't this happen back in 2.6.31 or
> something).
> 
> (cc'ing Mel who tracked these down before).
> 

Ok, so there has been very little changed in the page allocator that
might have caused this. The only possibility I can think of is
[6dda9d: page allocator: reduce fragmentation in buddy allocator by
adding buddies that are merging to the tail of the free lists] but I
would have expected the patch to help, not hinder, this situation.

In the last few kernel releases when there has been a spike in high-order
allocation, it has been largely due to an increased source of high-order
callers from somewhere. I am 99% sure the last time I encountered this, I used
the following script to debug it even though it's a bit of a hack. It uses
ftrace to track high-order allocators and reports the number of high-order
allocations (either atomic or normal) and the count of unique backtraces
(to point the finger at bad callers)

Unfortunately, I don't have access to a suitable machine to test this with
right now (my laptop isn't making high-order allocations to confirm yet)
but I'm pretty sure it's the right one!

If you do some fixed set of tests with 2.6.34 and 2.6.35-rc2 with this script
running and compare the reports, it might point the finger at a new source
of frequent high-order allocations that might be leading to more failures.
Do something like

./watch-highorder.pl -o highorder-2.6.34-report.txt
PID=$!
Do your tests
# Two sigints in quick succession to make the monitor exit
kill $PID
kill $PID 

==== CUT HERE ====
#!/usr/bin/perl
# This is a tool that analyses the trace output related to page allocation,
# sums up the number of high-order allocations taking place and who the
# callers are
#
# Example usage: trace-pagealloc-highorder.pl -o output-report.txt
# options
# -o, --output  Where to store the report
#
# Copyright (c) IBM Corporation 2009
# Author: Mel Gorman <mel at csn.ul.ie>
use strict;
use Getopt::Long;

# Tracepoint events
use constant MM_PAGE_ALLOC              => 1;
use constant EVENT_UNKNOWN              => 7;

use constant HIGH_NORMAL_HIGHORDER_ALLOC        => 10;
use constant HIGH_ATOMIC_HIGHORDER_ALLOC        => 11;

my $opt_output;
my %stats;
my %unique_events;
my $last_updated = 0;

$stats{HIGH_NORMAL_HIGHORDER_ALLOC} = 0;
$stats{HIGH_ATOMIC_HIGHORDER_ALLOC} = 0;

# Catch sigint and exit on request
my $sigint_report = 0;
my $sigint_exit = 0;
my $sigint_pending = 0;
my $sigint_received = 0;
sub sigint_handler {
        my $current_time = time;
        if ($current_time - 2 > $sigint_received) {
                print "SIGINT received, report pending. Hit ctrl-c again to 
exit\n";
                $sigint_report = 1;
        } else {
                if (!$sigint_exit) {
                        print "Second SIGINT received quickly, exiting\n";
                }
                $sigint_exit++;
        }

        if ($sigint_exit > 3) {
                print "Many SIGINTs received, exiting now without report\n";
                exit;
        }

        $sigint_received = $current_time;
        $sigint_pending = 1;
}
$SIG{INT} = "sigint_handler";

# Parse command line options
GetOptions(
        'output=s'    => \$opt_output,
);

# Defaults for dynamically discovered regex's
my $regex_pagealloc_default = 'page=([0-9a-f]*) pfn=([0-9]*) order=([-0-9]*) 
migratetype=([-0-9]*) gfp_flags=([A-Z_|]*)';

# Dyanically discovered regex
my $regex_pagealloc;

# Static regex used. Specified like this for readability and for use with /o
#                      (process_pid)     (cpus      )   ( time  )   (tpoint    
) (details)
my $regex_traceevent = 
'\s*([a-zA-Z0-9-]*)\s*(\[[0-9]*\])\s*([0-9.]*):\s*([a-zA-Z_]*):\s*(.*)';
my $regex_statname = '[-0-9]*\s\((.*)\).*';
my $regex_statppid = '[-0-9]*\s\(.*\)\s[A-Za-z]\s([0-9]*).*';

sub generate_traceevent_regex {
        my $event = shift;
        my $default = shift;
        my $regex;

        # Read the event format or use the default
        if (!open (FORMAT, "/sys/kernel/debug/tracing/events/$event/format")) {
                $regex = $default;
        } else {
                my $line;
                while (!eof(FORMAT)) {
                        $line = <FORMAT>;
                        if ($line =~ /^print fmt:\s"(.*)",.*/) {
                                $regex = $1;
                                $regex =~ s/%p/\([0-9a-f]*\)/g;
                                $regex =~ s/%d/\([-0-9]*\)/g;
                                $regex =~ s/%lu/\([0-9]*\)/g;
                                $regex =~ s/%s/\([A-Z_|]*\)/g;
                                $regex =~ 
s/\(REC->gfp_flags\).*/REC->gfp_flags/;
                                $regex =~ s/\",.*//;
                        }
                }
        }

        # Verify fields are in the right order
        my $tuple;
        foreach $tuple (split /\s/, $regex) {
                my ($key, $value) = split(/=/, $tuple);
                my $expected = shift;
                if ($key ne $expected) {
                        print("WARNING: Format not as expected '$key' != 
'$expected'");
                        $regex =~ s/$key=\((.*)\)/$key=$1/;
                }
        }

        if (defined shift) {
                die("Fewer fields than expected in format");
        }

        return $regex;
}
$regex_pagealloc = generate_traceevent_regex("kmem/mm_page_alloc",
                        $regex_pagealloc_default,
                        "page", "pfn",
                        "order", "migratetype", "gfp_flags");

sub process_events {
        my $traceevent;
        my $process_pid = 0;
        my $cpus;
        my $timestamp;
        my $tracepoint;
        my $details;
        my $statline;
        my $nextline = 1;

        # Read each line of the event log
EVENT_PROCESS:
        while (($traceevent = <TRACING>) && !$sigint_exit) {
SKIP_LINEREAD:

                if ($traceevent eq "") {
                        last EVENT_PROCSS;
                }

                if ($traceevent =~ /$regex_traceevent/o) {
                        $process_pid = $1;
                        $tracepoint = $4;
                } else {
                        next;
                }

                # Perl Switch() sucks majorly
                if ($tracepoint eq "mm_page_alloc") {
                        my ($page, $order, $gfp_flags, $type);
                        my ($atomic);
                        my $details = $5;

                        if ($details !~ /$regex_pagealloc/o) {
                                print "WARNING: Failed to parse mm_page_alloc 
as expected\n";
                                print "$details\n";
                                print "$regex_pagealloc\n";
                                print "\n";
                                next;
                        }
                        $page = $1;
                        $order = $3;
                        $gfp_flags = $5;

                        # Only concerned with high-order allocs
                        if ($order == 0) {
                                next;
                        }

                        $stats{MM_PAGE_ALLOC}++;

                        if ($gfp_flags =~ /ATOMIC/) {
                                $stats{HIGH_ATOMIC_HIGHORDER_ALLOC}++;
                                $type = "atomic";
                        } else {
                                $stats{HIGH_NORMAL_HIGHORDER_ALLOC}++;
                                $type = "normal";
                        }

                        # Record the stack trace
                        $traceevent = <TRACING>;
                        if ($traceevent !~ /stack trace/) {
                                goto SKIP_LINEREAD;
                        }
                        my $event = "order=$order $type 
gfp_flags=$gfp_flags\n";;
                        while ($traceevent = <TRACING>) {
                                if ($traceevent !~ /^ =>/) {
                                        $unique_events{$event}++;
                                        my $current = time;

                                        if ($current - $last_updated > 60) {
                                                $last_updated = $current;
                                                update_report();
                                        }
                                        goto SKIP_LINEREAD;
                                }
                                $event .= $traceevent;
                        }
                } else {
                        $stats{EVENT_UNKNOWN}++;
                }

                if ($sigint_pending) {
                        last EVENT_PROCESS;
                }
        }
}

sub update_report() {
        my $event;
        open (REPORT, ">$opt_output") || die ("Failed to open $opt_output for 
writing");

        foreach $event (keys %unique_events) {
                print REPORT $unique_events{$event} . " instances $event\n";
        }
        print REPORT "High-order normal allocations: " . 
$stats{HIGH_NORMAL_HIGHORDER_ALLOC} . "\n";
        print REPORT "High-order atomic allocations: " . 
$stats{HIGH_ATOMIC_HIGHORDER_ALLOC} . "\n";

        close REPORT;
}

sub print_report() {
        print "\nReport\n";
        open (REPORT, $opt_output) || die ("Failed to open $opt_output for 
reading");
        while (<REPORT>) {
                print $_;
        }
        close REPORT;
}

# Process events or signals until neither is available
sub signal_loop() {
        my $sigint_processed;
        do {
                $sigint_processed = 0;
                process_events();

                # Handle pending signals if any
                if ($sigint_pending) {
                        my $current_time = time;

                        if ($sigint_exit) {
                                print "Received exit signal\n";
                                $sigint_pending = 0;
                        }
                        if ($sigint_report) {
                                if ($current_time >= $sigint_received + 2) {
                                        update_report();
                                        print_report();
                                        $sigint_report = 0;
                                        $sigint_pending = 0;
                                        $sigint_processed = 1;
                                        $sigint_exit = 0;
                                }
                        }
                }
        } while ($sigint_pending || $sigint_processed);
}

sub set_traceoption($) {
        my $option = shift;

        open(TRACEOPT, ">/sys/kernel/debug/tracing/trace_options") || 
die("Failed to open trace_options");
        print TRACEOPT $option;
        close TRACEOPT;
}

sub enable_tracing($) {
        my $tracing = shift;

        open(TRACING, ">/sys/kernel/debug/tracing/events/$tracing/enable") || 
die("Failed to open tracing event $tracing");
        print TRACING "1";
        close TRACING;
}

sub disable_tracing($) {
        my $tracing = shift;

        open(TRACING, ">/sys/kernel/debug/tracing/events/$tracing/enable") || 
die("Failed to open tracing event $tracing");
        print TRACING "0";
        close TRACING;

}

set_traceoption("stacktrace");
set_traceoption("sym-offset");
set_traceoption("sym-addr");
enable_tracing("kmem/mm_page_alloc");
open(TRACING, "/sys/kernel/debug/tracing/trace_pipe") || die("Failed to open 
trace_pipe");
signal_loop();
close TRACING;
disable_tracing("kmem/mm_page_alloc");
update_report();
print_report();

Reply via email to