Hello,

I have written a small script that, I think, demonstrates a memory leak in savefig. A search of the mailing list shows a thread started by Ralf Gommers <ralf.gomm...@googlemail.com> about 2009-07-01 that seems to cover a very similar issue. I have appended the demonstration script at the end of this e-mail text.

The demonstration script script sits in a relatively tight loop creating figures then saving them while monitoring memory usage. A plot of VmRSS vs. number of loop iterations as generated on my system is attached as "data.png" (you can create your own plots with the sample script). Although I have only tested this on Fedora 12, I expect that most Linux users should be able to run the script for themselves. Users should be able to comment out the "savefig" line and watch memory usage go from unbounded to (relatively) bounded.

Can anybody see a cause for this leak hidden in my code? Has anybody seen this issue and solved it? I would also appreciate it if other people would run this script and report their findings so that there will be some indication of the problem's manifestation frequency.

Sincerely,
Keegan Callin


************************************************************************

'''Script to demonstrate memory leakage in savefig call.

Requirements:
Tested in Fedora 12.  It should work on other systems where
/proc/{PID}/status files exist and those files contain a 'VmRSS' entry
(this is how the script monitors its memory usage).

System Details on Original Test System:

[kee...@grizzly test]$ uname -a
Linux grizzly 2.6.32.9-70.fc12.x86_64 #1 SMP Wed Mar 3 04:40:41 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux

[kee...@grizzly ~]$ gcc --version
gcc (GCC) 4.4.3 20100127 (Red Hat 4.4.3-4)
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[kee...@grizzly ~]$ cd ~/src/matplotlib-0.99.1.1
[kee...@grizzly matplotlib-0.99.1.1]$ rm -rf build
[kee...@grizzly matplotlib-0.99.1.1]$ python setup.py build &> out.log
[kee...@grizzly matplotlib-0.99.1.1]$ head -38 out.log
============================================================================
BUILDING MATPLOTLIB
            matplotlib: 0.99.1.1
                python: 2.6.4 (r264:75706, Jan 20 2010, 12:34:05)  [GCC
                        4.4.2 20091222 (Red Hat 4.4.2-20)]
              platform: linux2

REQUIRED DEPENDENCIES
                 numpy: 1.4.0
             freetype2: 9.22.3

OPTIONAL BACKEND DEPENDENCIES
                libpng: 1.2.43
               Tkinter: no
                        * TKAgg requires Tkinter
              wxPython: no
                        * wxPython not found
                  Gtk+: no
* Building for Gtk+ requires pygtk; you must be able
                        * to "import gtk" in your build/install environment
       Mac OS X native: no
                    Qt: no
                   Qt4: no
                 Cairo: no

OPTIONAL DATE/TIMEZONE DEPENDENCIES
              datetime: present, version unknown
              dateutil: matplotlib will provide
                  pytz: 2010b

OPTIONAL USETEX DEPENDENCIES
                dvipng: no
           ghostscript: 8.71
                 latex: no
               pdftops: 0.12.4

[Edit setup.cfg to suppress the above messages]
============================================================================
[kee...@grizzly matplotlib-0.99.1.1]$ bzip2 out.log
# out.log.bz2 is attached to the message containing this program.

[kee...@grizzly ~]$ python2.6
Python 2.6.4 (r264:75706, Jan 20 2010, 12:34:05)
[GCC 4.4.2 20091222 (Red Hat 4.4.2-20)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import matplotlib
>>> matplotlib.__version__
'0.99.1.1'
'''
# Import standard python modules
import sys
import os
from ConfigParser import SafeConfigParser as ConfigParser
from cStringIO import StringIO

# import numpy
import numpy
from numpy import zeros

# Import matplotlib
from matplotlib.figure import Figure
from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas


def build_figure(a):
    '''Returns a new figure containing array a.'''

    # Create figure and setup graph
    fig = Figure()
    FigureCanvas(fig)
    ax = fig.add_subplot(1, 1, 1)
    ax.plot(a)

    return fig


_proc_status = '/proc/%d/status' % os.getpid()
def load_status():
'''Returns a dict of process statistics from from /proc/{PID}/status.'''
    status = {}

    with open(_proc_status) as f:
        for line in f:
            key, value = line.split(':', 1)
            key = key.strip()
            value = value.strip()
            status[key] = value

    return status


def main():
    data_file = 'data.txt'
    image_file = 'data.png'
    num_iterations = 1000

    with open(data_file, 'w') as f:
        # Tried running without matplotlib or numpy such that the
        # only thing happening in the process is the dumping of process
        # status information to `data_file` from the loop.  Memory
        # usage reaches a bound _very_ quickly.
        status = load_status()
        rss, unit = status['VmRSS'].split()
        print >>f, rss

        print 'Executing', num_iterations, 'iterations.'
        a = zeros(10000)
        for i in xrange(0, num_iterations):
            # Shift random data is being shifted into a numpy array.
            # With numpy and the process status dump enabled, memory
            # usage reaches a bound very quickly.
            a[0:-1] = a[1:]
            a[-1] = numpy.random.rand(1)[0]

            # When figures of the array are generated in each loop,
            # memory reaches a bound more slowly(~50 iterations) than
            # without matplotlib; nevertheless, memory usage still
            # appears to be bounded.
            fig = build_figure(a)

            # Savefig alone causes memory usage to become unbounded.
            # Memory usage increase seems to be linear with the number
            # of iterations.
            sink = StringIO()
fig.savefig(sink, format='png', dpi=80, transparent=False, bbox_inches="tight", pad_inches=0.15)
            # This line below can be used to demonstrate that StringIO
            # does not leak without the savefig call.
            #sink.write(1000*'hello')
            sink.close()

            status = load_status()
            rss, unit = status['VmRSS'].split()
            print >>f, rss
            sys.stdout.write('#')
            sys.stdout.flush()

    # Load process statistics and save them to a file.
    print
    print 'Graphing memory usage data from', data_file, 'to', image_file
    with open(data_file) as f:
        rss = [int(r) for r in f]
    fig = build_figure(rss)

    with open(image_file, 'wb') as f:
        fig = build_figure(rss)
fig.savefig(f, format='png', dpi=80, transparent=False, bbox_inches="tight", pad_inches=0.15)

    return 0

if __name__ == '__main__':
    sys.exit(main())

<<attachment: data.png>>

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Reply via email to