Re: memory leak in request.readline()

Alexis Marrero Fri, 11 Aug 2006 14:15:56 -0700

I forgot to mentioned that I changed request_tp_dealloc to:
static void request_tp_dealloc(requestobject *self)
{
    // de-register the object from the GC
    // before its deallocation, to prevent the
    // GC to run on a partially de-allocated object
    if (self->rbuff != NULL) {
        free(self->rbuff);
    }
    PyObject_GC_UnTrack(self);

    request_tp_clear(self);

    PyObject_GC_Del(self);
}

I don't know if that function will be the right place to free(self->rbuff) but it for the mean time there is no leak in my test.

Jim Gallacher wrote:

I've created a JIRA issue for the readline leaks. The one I detail is a
corner case related to what you found, but I don't think the fix below
will help. Take a look at 182 and let me know what you think.
http://issues.apache.org/jira/browse/MODPYTHON-182


I think we should be checking requestobject self->rbuff during the
request cleanup and make sure it really is NULL, just as a safety check.

Alexis Marrero wrote:

Jim,

I found the culprit!!!

There are two unrelated memory leaks.

The first one is in req_readline().

This code:

   /* is there anything left in the rbuff from previous reads? */
   if (self->rbuff_pos < self->rbuff_len) {
             /* if yes, process that first */
       while (self->rbuff_pos < self->rbuff_len) {
           buffer[copied++] = self->rbuff[self->rbuff_pos];
           if ((self->rbuff[self->rbuff_pos++] == '\n') ||
               (copied == len)) {

               /* our work is done */

               /* resize if necessary */
               if (copied < len)
                   if(_PyString_Resize(&result, copied))
                       return NULL;
               return result;
           }
       }
   }

Should look like this:
   /* is there anything left in the rbuff from previous reads? */
   if (self->rbuff_pos < self->rbuff_len) {
             /* if yes, process that first */
       while (self->rbuff_pos < self->rbuff_len) {
           buffer[copied++] = self->rbuff[self->rbuff_pos];
           if ((self->rbuff[self->rbuff_pos++] == '\n') ||
               (copied == len)) {

               /* our work is done */

               /* resize if necessary */
               if (copied < len)
                   if(_PyString_Resize(&result, copied))
                       return NULL;
               if (self->rbuff_pos >= self->rbuff_len && self->rbuff !=
NULL)
               {
                   free(self->rbuff);
                   self->rbuff = NULL;
               }
               return result;
           }
       }
   }

That solves one.  Like I mentioned in one of the emails to the mailing
list, the buffer was not been freed in the last readline().


But not completely - see MODPYTHON-182.

The second one, for which I don't have a fix yet is apache.make_table()
in mod_python/util.py line 152. If I comment lines 152, 225, 227 you
will see that memory doesn't grow.  I will keep investigating...


As will I ...

Jim

Until the next email.

/amn
Jim Gallacher wrote:

I ran my baseline test with 500k requests, and got the following:
(Note that all the figures will have an error of +/- 0.1)

baseline      500k requests     1.7%


So it would seem that there is not a specific problem in readline, or my
test case is messed up. FYI here are my 2 handlers:

def baseline_handler(req):
    req.content_type = 'text/plain'
    req.write('ok baseline:')
    return apache.OK


def readline_handler(req):
    # the body of the request consists of
    # '\n'.join([ 'a'*10 for i in xrange(0,10)  ])
    req.content_type = 'text/plain'
    count = 0
    while(1):
        line = req.readline()
        if not line:
            break
        count += 1

    req.write('ok readline: %d lines read' % count)
    return apache.OK

Jim


Jim Gallacher wrote:

I'll have some time to investigate this over the next couple of days. I
ran my leaktest script for FieldStorage and readline, and FieldStorage
certainly still leaks, but I'm not so sure about readline itself.

baseline      1k requests     1.2%
readline    500k requests     1.6%
fieldstorage    498k requests    10.1%

The memory consumption figures are for a machine with 512MB ram.

I'm running my baseline test with 500k requests right now to see if the
1.6% figure for readline represents a real leak in that function, or if
it is just mod_python itself.

My memory leak test suite is probably at the point that other people
will find it useful. Once I've written a README explaining its use I'll
commit it to the repository so everybody to play. If you anyone wants to
give it a shot in the interim I can email it to you. Give me shout
offlist.

I haven't had a chance to look at the code you highlight below, or at
least not closely. The whole req_readline function looks like it will
require a good strong cup of coffee to fully comprehend. ;)

Jim

Alexis Marrero wrote:

Experimenting on this issue, I noticed that neither of the following
set
of "ifs" are ever met:


   786      /* Free old rbuff as the old contents have been copied
over and
   787         we are about to allocate a new rbuff. Perhaps this could
be reused
   788         somehow? */
   789      if (self->rbuff_pos >= self->rbuff_len && self->rbuff !=
NULL)
   790      {
   791          free(self->rbuff);
   792          self->rbuff = NULL;
   793      }


--------

   846      /* Free rbuff if we're done with it */
   847      if (self->rbuff_pos >= self->rbuff_len && self->rbuff !=
NULL)
   848      {
   849          free(self->rbuff);
   850          self->rbuff = NULL;
   851      }

I noticed that by putting some statements to write to the output
stream.  They never execute.

/amn

On Aug 10, 2006, at 1:43 PM, Alexis Marrero wrote:

All,

We are trying to nail down a memory leak that happens only when
documents are POSTed to the server.

For testing we have a short script that does:

while True:
    dictionary_of_parameters = {'field1': 'a'*100000}
    post('url...', dictionary_of_parameters)

Then we run "top" on the server and watch the server memory grow
without bound.  Why do we know that the problem is in
request.readline()?  If I go to
mod_python.util.FieldStorage.read_to_boundary() and add the following
statement:

def read_to_boundary(...):
  return True
  ...

as the first executable line in the function the memory does not grow.

I have read the req_readline a 1000 time and I can't figure out where
the problem is.


My config:
Python 2.4.1
mod_python 3.2.10

Our request handler does nothing other than using
util.FieldStorage(req) and req.write('hello').

I have some suspicion that it has to do with:
....
    19   * requestobject.c
    20   *
    21   * $Id: requestobject.c 420297 2006-07-09 13:53:06Z nlehuen $
    22   *
    23   */
....
   846      /* Free rbuff if we're done with it */
   847      if (self->rbuff_pos >= self->rbuff_len && self->rbuff !=
NULL)
   848      {
   849          free(self->rbuff);
   850          self->rbuff = NULL;
   851      }
   852

Though, I can't confirm.


/amn

Re: memory leak in request.readline()

Reply via email to