Excerpts from Steve Hawkins's message of Mon Sep 28 00:17:53 +0100 2009: > We are developing moderately sized user interface with Clutter, and are > consistently seeing memory allocation errors, particularly from the > graphics driver. When we use "top" (our platform is Linux) to monitor > the memory used by our application, we see that our application is > consuming over 140MB of RAM. > > In looking at our application and our graphics, we can only account for > 20 - 30 MB of images that our application is using. Are there > optimization tricks or hints that could help us reduce the memory > footprint of our application? Are there switches that we can set in the > build to help us diagnose why so much memory is being used? > > Any guidance would be very much appreciated.
Analysing memory usage for a Clutter application is somewhat non-trivial. Here's a brain dump about analyzing memory usage; some of it may hopefully be helpful to you, though it probably overlooks a number of important things too. Consider that there are allocations spread between several address spaces: - you have standard allocations via APIs like malloc() and realloc etc done by the application, Clutter and OpenGL. - If using GLX/EGLX then you may have allocations made on the applications behalf in the X server. (potentially these may then be mapped back into the Clients address space; such as for handling ClutterGLXTexturePixmap fallbacks) - then there are allocations made in the kernel space drm driver to manage data associated with your application. - finally there are allocations made via ioctls to the kernel space drm driver that reserve RAM for either mapping into the GPU or your application. (E.g. GEM) Considering just OpenGL: Your OpenGL driver will likely preallocate certain buffers or create various caches of state that we don't have much control over. One big user of memory in the Intel driver seems to be associated with relocation buffers. (when the userspace dri driver creates a buffer of commands for the GPU it may need to reference other buffers for which the address can't be determined until the commands get executed, so the driver needs to track the relocations necessary) On my current driver I see ~3M associated with drm_intel_setup_reloc_list. Some earlier drivers have been *much* worse than this though; it's possible you have such a version. If your using mesa, then around 3M are allocated to support the swrast driver used for software fallbacks. This seems a shame because the only thing Clutter typically uses the swrast driver for is the glReadPixels fallback path where we only ever read back a single pixel at a time for picking. Mesa also preallocates several buffers in line with GL_MAX_ELEMENTS_VERTICES (the maximum recommended number of vertices to submit to glDrawElements for the best efficiency) by default this 3000, and results in allocation of several megs. Then aside from malloc there will then be numerous allocations for mapping data into the GPU. These may be vertex arrays, state buffers for various units of the GPU or texture memory allocations. These are allocated by the driver via special purpose ioctls (GEM is used for Intel drivers) If you have one Cluter stage that's 800x600, consider: - It's probably 4 bytes per pixel for the front color buffer - It could be another 4 for a combined depth/stencil buffer - The Color buffer is usually double buffered - That comes to about 5.5 megs If you use clutter_texture_new_from_actor () that creates a framebuffer object the same size as the ClutterTexture actor which may have ancillary buffers associated with it like the stage. (Currently only a stencil buffer) If possible test with multiple GL drivers, to see how that affects your memory usage. One driver might have a leak, or might have tuned some cache sizes to be very large. - If using mesa, perhaps try running with LIBGL_ALWAYS_SOFTWARE=1 and see what difference that makes. Remember Clutter uses Glib internally and Glib's slice allocator may keep freed allocations around unless you export G_SLICE=always-malloc before running your application. To get a bit more insight in to where some of your applications allocations are coming from you could try using the massif tool that comes with valgrind: E.g. G_SLICE=always-malloc valgrind --tool=massif ./application then to analyse the results use: ms_print ./massif.out.PID|less More details about this tool can be found here: http://valgrind.org/docs/manual/ms-manual.html Exmap, might also be able to give some insight into memory usage: http://www.berthels.co.uk/exmap/download/ I haven't used it much myself, and it looks like it's not maintained anymore but it's quite nice that it considers memory that is shared between processes. When I tried compiling it the other day I had to patch the kernel module to get it building/loading, so I've attached my patch in case you want to try. If you are running with an Intel driver using GEM then you can look at debugfs to find out how much memory is associated with GEM objects. mount -t debugfs none /sys/kernel/debug cat /sys/kernel/debug/dri/0/gem_objects Since the data isn't related to processes you might need to manually compare the numbers before and while running you application. Note: dri/0 may not be right for your system; look at dri/X/name Beware, some of these objects may be mapped into your applications address space so depending on what tools you use, you may need to be careful not to account for it twice. (I'm not sure of an easy way to do that though) One thing I'm considering is patching Valgrind's massif tool to teach it about some of the GEM ioctls since they probably account for a large proportion of a Clutter apps allocations. kmemtrace (http://lwn.net/Articles/289880/) is another tool I've tried to use (without much success so far) to get insight into some of the kernel space allocations (kmalloc) associated with drm drivers. You need to rebuild your kernel with the CONFIG_KMEMTRACE option for this and need to clone the userspace tool: git://repo.or.cz/kmemtrace-user.git (and since the ABI changed some time ago it seems you have to use the ftrace-temp branch) As I said though I haven't managed to get much help from this so far, but if you or anyone else manages to, I'd be interested to hear. xrestop is a tool that lets you look at the memory allocated by the X server on your applications behalf. Beware, that some of this could be mapped via XSHM to your application so again you may need to be careful you don't account for it twice. (Not sure of an easy way to do that though) Overall I think it's fair to say, Clutter hasn't had much focused effort spent profiling memory usage yet. It's quite possible there is some low hanging fruit we aren't aware of at the moment. We still need to come up with a good methodology for analyzing applications, and that probably involves improving some of the tools currently available. I think we could do with a Wiki for Clutter with a page dedicated to documenting the tools and methodologies that people can use to analyze Clutter applications. (Not least because I'm interested in collecting ideas from others about this too) I'll try and follow up on this and keep you informed. kind regards, - Robert -- Robert Bragg, Intel Open Source Technology Center
0001--build-Fix-some-minor-compile-errors-update-kerne.patch
Description: Binary data
