Excerpts from Steve Hawkins's message of Mon Sep 28 00:17:53 +0100 2009:
> We are developing moderately sized user interface with Clutter, and are
> consistently seeing memory allocation errors, particularly from the
> graphics driver.  When we use "top" (our platform is Linux) to monitor
> the memory used by our application, we see that our application is
> consuming over 140MB of RAM.
>
> In looking at our application and our graphics, we can only account for
> 20 - 30 MB of images that our application is using.  Are there
> optimization tricks or hints that could help us reduce the memory
> footprint of our application?  Are there switches that we can set in the
> build to help us diagnose why so much memory is being used?
>
> Any guidance would be very much appreciated.

Analysing memory usage for a Clutter application is somewhat
non-trivial. Here's a brain dump about analyzing memory usage; some of
it may hopefully be helpful to you, though it probably overlooks a
number of important things too.

Consider that there are allocations spread between several address spaces:
- you have standard allocations via APIs like malloc() and realloc etc
  done by the application, Clutter and OpenGL.
- If using GLX/EGLX then you may have allocations made on the
  applications behalf in the X server. (potentially these may then be
  mapped back into the Clients address space; such as for handling
  ClutterGLXTexturePixmap fallbacks)
- then there are allocations made in the kernel space drm driver to
  manage data associated with your application.
- finally there are allocations made via ioctls to the kernel space drm
  driver that reserve RAM for either mapping into the GPU or your
  application. (E.g. GEM)

Considering just OpenGL: Your OpenGL driver will likely preallocate
certain buffers or create various caches of state that we don't have
much control over.

One big user of memory in the Intel driver seems to be associated with
relocation buffers. (when the userspace dri driver creates a buffer of
commands for the GPU it may need to reference other buffers for
which the address can't be determined until the commands get executed,
so the driver needs to track the relocations necessary) On my current
driver I see ~3M associated with drm_intel_setup_reloc_list. Some
earlier drivers have been *much* worse than this though; it's
possible you have such a version.

If your using mesa, then around 3M are allocated to support the swrast
driver used for software fallbacks. This seems a shame because the only
thing Clutter typically uses the swrast driver for is the glReadPixels
fallback path where we only ever read back a single pixel at a time for
picking.

Mesa also preallocates several buffers in line with
GL_MAX_ELEMENTS_VERTICES (the maximum recommended number of vertices to
submit to glDrawElements for the best efficiency) by default this 3000,
and results in allocation of several megs.

Then aside from malloc there will then be numerous allocations for
mapping data into the GPU. These may be vertex arrays, state buffers for
various units of the GPU or texture memory allocations. These are
allocated by the driver via special purpose ioctls (GEM is used for
Intel drivers)

If you have one Cluter stage that's 800x600, consider:
- It's probably 4 bytes per pixel for the front color buffer
- It could be another 4 for a combined depth/stencil buffer
- The Color buffer is usually double buffered
- That comes to about 5.5 megs

If you use clutter_texture_new_from_actor () that creates a framebuffer
object the same size as the ClutterTexture actor which may have
ancillary buffers associated with it like the stage. (Currently only a
stencil buffer)

If possible test with multiple GL drivers, to see how that affects your
memory usage. One driver might have a leak, or might have tuned some
cache sizes to be very large.
- If using mesa, perhaps try running with LIBGL_ALWAYS_SOFTWARE=1 and
  see what difference that makes.

Remember Clutter uses Glib internally and Glib's slice allocator may
keep freed allocations around unless you export G_SLICE=always-malloc
before running your application.

To get a bit more insight in to where some of your applications
allocations are coming from you could try using the massif tool that
comes with valgrind:
E.g. G_SLICE=always-malloc valgrind --tool=massif ./application
then to analyse the results use:
ms_print ./massif.out.PID|less
More details about this tool can be found here:
http://valgrind.org/docs/manual/ms-manual.html

Exmap, might also be able to give some insight into memory usage:
http://www.berthels.co.uk/exmap/download/
I haven't used it much myself, and it looks like it's not maintained
anymore but it's quite nice that it considers memory that is shared
between processes. When I tried compiling it the other day I had to
patch the kernel module to get it building/loading, so I've attached
my patch in case you want to try.

If you are running with an Intel driver using GEM then you can look
at debugfs to find out how much memory is associated with GEM objects.
mount -t debugfs none /sys/kernel/debug
cat /sys/kernel/debug/dri/0/gem_objects
Since the data isn't related to processes you might need to manually
compare the numbers before and while running you application.
Note: dri/0 may not be right for your system; look at dri/X/name
Beware, some of these objects may be mapped into your applications
address space so depending on what tools you use, you may need to be
careful not to account for it twice.  (I'm not sure of an easy way to do
that though)

One thing I'm considering is patching Valgrind's massif tool to teach it
about some of the GEM ioctls since they probably account for a large
proportion of a Clutter apps allocations.

kmemtrace (http://lwn.net/Articles/289880/) is another tool I've tried
to use (without much success so far) to get insight into some of the
kernel space allocations (kmalloc) associated with drm drivers. You need
to rebuild your kernel with the CONFIG_KMEMTRACE option for this and
need to clone the userspace tool: git://repo.or.cz/kmemtrace-user.git
(and since the ABI changed some time ago it seems you have to use the
ftrace-temp branch) As I said though I haven't managed to get much help
from this so far, but if you or anyone else manages to, I'd be
interested to hear.

xrestop is a tool that lets you look at the memory allocated by the X
server on your applications behalf. Beware, that some of this could be
mapped via XSHM to your application so again you may need to be careful
you don't account for it twice. (Not sure of an easy way to do that
though)

Overall I think it's fair to say, Clutter hasn't had much focused
effort spent profiling memory usage yet. It's quite possible there is
some low hanging fruit we aren't aware of at the moment. We still
need to come up with a good methodology for analyzing applications,
and that probably involves improving some of the tools currently
available.

I think we could do with a Wiki for Clutter with a page dedicated to
documenting the tools and methodologies that people can use to analyze
Clutter applications. (Not least because I'm interested in collecting
ideas from others about this too) I'll try and follow up on this and
keep you informed.

kind regards,
- Robert
-- 
Robert Bragg, Intel Open Source Technology Center

Attachment: 0001--build-Fix-some-minor-compile-errors-update-kerne.patch
Description: Binary data

Reply via email to