So, at work we have some server processes which are glutons for memory.
We'd like to work out why, which is tricky. At least, I think that it's tricky.

So, I considered what would be the simplest thing that *might* work.
Devel::NYTProf can measure the elapsed time at a statement granularity, so
why not use that framework to also measure memory usage?

First go was trying to measure VMSize, which is relatively easy on Linux from
/proc, but turned out not to be that useful, because glibc on the work x86_64
systems was allocating 33 *pages* at a time. Pah!

So, I've reached iteration two.

I've written a malloc() intereception library, which lets me keep track of how
much memory is currently allocated by malloc()/realloc(), and a *bodge* to
NYTprof.xs to interrogate it in all the places where NYTProf.xs currently
reads the time.

This seems to work "reasonably" well, although it turns out that considerable
hoop-jumping is needed on x86_64 Linux, because dlsym() calls malloc(), which
isn't "helpful" when you're in your own malloc() calling dlsym() to find the
real malloc(). But the shared library works on Linux (x86 and x86_64 tested)
and FreeBSD without recompiling perl, via LD_PRELOAD.

Sadly OS X doesn't have LD_PRELOAD, so you have to explicitly link with the
interception library.

Should also work on Solaris, or anywhere else where DynaLoader uses
dl_dlopen.xs, but not tested.


I'm not sure where to go from here, or whether it's really something that
Devel::NYTProf would want to support as an option. Right now the
implementation (appended) is definately a "prototype":

* whilst both memory delta and times are written to/read from the profile file
  the report generation is unchanged - right now the memory delta is bodged
  in, in place of the time, so the line-by-line columns in "seconds" are
  actually bytes.
  (and have large integers, which the formatting code doesn't expect.
   and can be negative which the colouring code doesn't expected')

  I'm not sure how to procede here. 1 or 2 more columns in the HTML reports?
  And hence modification to all the layers between the stream reader and HTML?

* callbacks dont' change their tag, but have 1 extra parameter with the memory
  size delta or undef if there isn't a delta present.
  Likewise the block_line_num and sub_line_num are undef if they're not
  present, to keep the memory delta at the same position.

  This is completely untested and feels like a bit of a bodge.

* hardcoded prototype for get_total_allocated();
* no good handling of output format overflows


The appended patch *will* fail tests, because the tests expect (positive)
seconds, not signed memory deltas. :-)


To build the shared library, you need something like

gcc -shared -Os -Wall -L/usr/local/lib -fstack-protector malloc_intercept.c  -o 
malloc_intercept.so

or

gcc -fPIC  -shared -g -Wall -L/usr/local/lib -fstack-protector 
malloc_intercept.c  -o malloc_intercept.so -ldl

(or something different on OS X, but that command line is on a laptop at home)

and you need to run your profile something like this:

LD_PRELOAD=~/Perl/malloc_intercept/malloc_intercept.so PERL5OPT=-d:NYTProf 
script


(or under OS X you need to explicitly link your perl binary with
malloc_intercept.dylib)


I'm still digetsing whether it actually produces useful-enough results to be
good at pointing out where memory is going. I'm confident that the
interception library is recording exactly what *is* allocated. However, I'm
not sure if this is really that useful a metric for solving actual problems.

Nicholas Clark

Index: NYTProf.xs
===================================================================
--- NYTProf.xs  (revision 949)
+++ NYTProf.xs  (working copy)
@@ -19,6 +19,8 @@
 #define PERL_NO_GET_CONTEXT                       /* we want efficiency */
 #endif
 
+#define SIZES_TOO
+
 #include "EXTERN.h"
 #include "perl.h"
 #include "XSUB.h"
@@ -33,6 +35,10 @@
 #   include "ppport.h"
 #endif
 
+#ifdef SIZES_TOO
+size_t get_total_allocated();
+#endif
+
 #ifdef USE_HARD_ASSERT
 #undef NDEBUG
 #include <assert.h>
@@ -124,6 +130,8 @@
 #define NYTP_TAG_COMMENT         '#'    /* till newline */
 #define NYTP_TAG_TIME_BLOCK      '*'
 #define NYTP_TAG_TIME_LINE       '+'
+#define NYTP_TAG_TIME_BLOCK_MEM  '('
+#define NYTP_TAG_TIME_LINE_MEM   ')'
 #define NYTP_TAG_DISCOUNT        '-'
 #define NYTP_TAG_NEW_FID         '@'
 #define NYTP_TAG_SRC_LINE        'S'    /* fid, line, str */
@@ -342,6 +350,10 @@
 #endif
 static time_of_day_t start_time;
 static time_of_day_t end_time;
+#ifdef SIZES_TOO
+static size_t start_size;
+static size_t end_size;
+#endif
 
 static unsigned int last_executed_line;
 static unsigned int last_executed_fid;
@@ -1851,14 +1863,51 @@
     if (overflow)                                 /* XXX later output overflow 
to file */
         logwarn("profile time overflow of %ld seconds discarded\n", overflow);
 
+#ifdef SIZES_TOO
+    end_size = get_total_allocated();
+#endif
+
     reinit_if_forked(aTHX);
 
     if (last_executed_fid) {
+#ifdef SIZES_TOO
+       if (start_size && end_size) {
+           unsigned int output;
 
-        output_tag_int((unsigned char)((profile_blocks)
-                        ? NYTP_TAG_TIME_BLOCK : NYTP_TAG_TIME_LINE), elapsed);
-        output_int(last_executed_fid);
-        output_int(last_executed_line);
+           if (end_size >= start_size) {
+               size_t increase = end_size - start_size;
+               if (increase > (UINT_MAX >> 1)) {
+                   warn("profile size delta of +%lu seconds clipped",
+                        (unsigned long) increase);
+                   increase = UINT_MAX >> 1;
+               }
+               output = increase << 1;
+           } else {
+               size_t decrease = start_size - end_size;
+               if (decrease > (UINT_MAX >> 1)) {
+                   warn("profile size delta of -%lu seconds clipped",
+                        (unsigned long) decrease);
+                   decrease = UINT_MAX >> 1;
+               }
+               output = 1 | (decrease << 1);
+           }
+           output_tag_int((unsigned char)((profile_blocks)
+                            ? NYTP_TAG_TIME_BLOCK_MEM : 
NYTP_TAG_TIME_LINE_MEM), elapsed);
+
+           output_int(last_executed_fid);
+           output_int(last_executed_line);
+           output_int(output);
+       } else
+#endif
+       {
+           output_tag_int((unsigned char)((profile_blocks)
+                            ? NYTP_TAG_TIME_BLOCK : NYTP_TAG_TIME_LINE), 
elapsed);
+
+           output_int(last_executed_fid);
+           output_int(last_executed_line);
+       }
+
+
         if (profile_blocks) {
             output_int(last_block_line);
             output_int(last_sub_line);
@@ -1934,6 +1983,10 @@
     get_ticks_between(end_time, start_time, elapsed, overflow);
     cumulative_overhead_ticks += elapsed;
 
+#ifdef SIZES_TOO
+    start_size = get_total_allocated();
+#endif
+
     SETERRNO(saved_errno, 0);
     return;
 }
@@ -3075,6 +3128,9 @@
 
     /* discard time spent since profiler was disabled */
     get_time_of_day(start_time);
+#ifdef SIZES_TOO
+    start_size = get_total_allocated();
+#endif
 
     return prev_is_profiling;
 }
@@ -3161,6 +3217,8 @@
         PL_perldb |= PERLDBf_SINGLE; /* start (after BEGINs) with single-step 
on XXX still needed? */
     }
 
+    start_size = get_total_allocated();
+
     if (profile_opts & NYTP_OPTf_OPTIMIZE)
          PL_perldb &= ~PERLDBf_NOOPT;
     else PL_perldb |=  PERLDBf_NOOPT;
@@ -3990,6 +4048,10 @@
                 break;
             }
 
+#ifdef SIZES_TOO
+            case NYTP_TAG_TIME_LINE_MEM:                   /*FALLTHRU*/
+            case NYTP_TAG_TIME_BLOCK_MEM:                  /*FALLTHRU*/
+#endif
             case NYTP_TAG_TIME_LINE:                       /*FALLTHRU*/
             case NYTP_TAG_TIME_BLOCK:
             {
@@ -4001,11 +4063,29 @@
                 unsigned int ticks    = read_int();
                 unsigned int file_num = read_int();
                 unsigned int line_num = read_int();
+#ifndef SIZES_TOO
+               const bool is_block = (c == NYTP_TAG_TIME_BLOCK);
+#else
+               const bool is_block = (c == NYTP_TAG_TIME_BLOCK || c == 
NYTP_TAG_TIME_BLOCK_MEM);
+               int allocated_diff = 0;
+               bool got_diff;
+#endif
 
+#ifdef SIZES_TOO
+               if (c == NYTP_TAG_TIME_LINE_MEM || c == 
NYTP_TAG_TIME_BLOCK_MEM) {
+                   const unsigned int diff_raw = read_int();
+                   allocated_diff = diff_raw >> 1;
+                   if (diff_raw & 1)
+                       allocated_diff = -allocated_diff;
+                   got_diff = TRUE;
+               } else
+                   got_diff = FALSE;
+#endif
+
                 if (cb) {
                     PUSHMARK(SP);
 
-                    XPUSHs(c == NYTP_TAG_TIME_BLOCK ? cb_TIME_BLOCK_tag : 
cb_TIME_LINE_tag);
+                    XPUSHs(is_block ? cb_TIME_BLOCK_tag : cb_TIME_LINE_tag);
 
                     i = 0;
                     sv_setiv(cb_args[i], eval_file_num);  XPUSHs(cb_args[i++]);
@@ -4014,10 +4094,25 @@
                     sv_setiv(cb_args[i], file_num);       XPUSHs(cb_args[i++]);
                     sv_setiv(cb_args[i], line_num);       XPUSHs(cb_args[i++]);
 
-                    if (c == NYTP_TAG_TIME_BLOCK) {
+                    if (is_block) {
                         sv_setiv(cb_args[i], read_int()); 
XPUSHs(cb_args[i++]); /* block_line_num */
                         sv_setiv(cb_args[i], read_int()); 
XPUSHs(cb_args[i++]); /* sub_line_num */
+#ifdef SIZES_TOO
+                       if (got_diff) {
+                           sv_setiv(cb_args[i], allocated_diff); 
XPUSHs(cb_args[i++]);
+                       }
+#endif
                     }
+#ifdef SIZES_TOO
+                   else {
+                       if (got_diff) {
+                           SvOK_off(cb_args[i]);         XPUSHs(cb_args[i++]); 
/* block_line_num */
+                            SvOK_off(cb_args[i]);         
XPUSHs(cb_args[i++]); /* sub_line_num */
+                           sv_setiv(cb_args[i], allocated_diff); 
XPUSHs(cb_args[i++]);
+                       }
+                   }
+#endif
+                    assert(i <= C_ARRAY_LENGTH(cb_args));
 
                     PUTBACK;
                     call_sv(cb, G_DISCARD);
@@ -4051,25 +4146,37 @@
                 }
 
                 add_entry(aTHX_ fid_line_time_av, file_num, line_num,
+#ifdef SIZES_TOO
+                    allocated_diff, eval_file_num, eval_line_num,
+#else
                     seconds, eval_file_num, eval_line_num,
+#endif
                     1-statement_discount
                 );
 
-                if (c == NYTP_TAG_TIME_BLOCK) {
+                if (is_block) {
                     unsigned int block_line_num = read_int();
                     unsigned int sub_line_num   = read_int();
 
                     if (!fid_block_time_av)
                         fid_block_time_av = newAV();
                     add_entry(aTHX_ fid_block_time_av, file_num, 
block_line_num,
+#ifdef SIZES_TOO
+                        allocated_diff, eval_file_num, eval_line_num,
+#else
                         seconds, eval_file_num, eval_line_num,
+#endif
                         1-statement_discount
                     );
 
                     if (!fid_sub_time_av)
                         fid_sub_time_av = newAV();
                     add_entry(aTHX_ fid_sub_time_av, file_num, sub_line_num,
+#ifdef SIZES_TOO
+                        allocated_diff, eval_file_num, eval_line_num,
+#else
                         seconds, eval_file_num, eval_line_num,
+#endif
                         1-statement_discount
                     );
 
@@ -4078,7 +4185,11 @@
                 }
 
                 total_stmts_measured++;
+#ifdef SIZES_TOO
+                total_stmts_duration += allocated_diff;
+#else
                 total_stmts_duration += seconds;
+#endif
                 statement_discount = 0;
                 last_file_num = file_num;
                 last_line_num = line_num;

-- 
You've received this message because you are subscribed to
the Devel::NYTProf Development User group.

Group hosted at:  http://groups.google.com/group/develnytprof-dev
Project hosted at:  http://perl-devel-nytprof.googlecode.com
CPAN distribution:  http://search.cpan.org/dist/Devel-NYTProf

To post, email:  [email protected]
To unsubscribe, email:  [email protected]
#define _GNU_SOURCE
#include <dlfcn.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <pthread.h>
#include <sys/mman.h>
#include <signal.h>
#include <errno.h>
#include <assert.h>
#undef NDEBUG

#define ALIGNMENT 16
#define SENTINEL 0xDEADBEEF
#define MMAPPED 0xDECAF0AA

struct header {
    size_t	sentinel;
    size_t	size;
};

static pthread_mutex_t mi_mutex = PTHREAD_MUTEX_INITIALIZER;

static void
_mutex_bust(const char *const message, const size_t len, const int result)
{
    const char *error = strerror(result);
    size_t error_len = strlen(error);
    write(2, message, len);
    write(2, error, error_len);
    abort();
}

#define mutex_bust(where, error) {		\
	const char message[] = "malloc_intercept mutex " where " failed: "; \
	_mutex_bust(message, sizeof(message) - 1, error); \
	}

static void
mutex_lock_or_bust()
{
    const int result = pthread_mutex_lock(&mi_mutex);

    if (result)
	mutex_bust("lock", result);
}

static void
mutex_unlock_or_bust()
{
    const int result = pthread_mutex_unlock(&mi_mutex);

    if (result)
	mutex_bust("unlock", result);
}
	

static void *
_dlsym_or_bust(const char *which, int len)
{
    void *symbol = dlsym(RTLD_NEXT, which);

    if (!symbol) {
	static const char first[] = "dlsym(RTLD_NEXT, \"";
	static const char second[] = "\") failed, dlerror() ";
	static const char helpful[] = "is: ";
	static const char unhelpful[] = "returned NULL which isn't helpful here\n";
	const char *error = dlerror();
	const size_t error_len = error ? strlen(error) : 0;

	write(2, first, sizeof(first) - 1);
	write(2, which, len);
	write(2, second, sizeof(second) - 1);
	if (error) {
	    write(2, helpful, sizeof(helpful) - 1);
	    write(2, error, error_len);
	    write(2, "\n", 1);
	} else {
	    write(2, unhelpful, sizeof(unhelpful) - 1);
	}
	abort();
    }

    return symbol;
}

#define dlsym_or_bust(name) _dlsym_or_bust(name, sizeof("" name "") - 1)

typedef void *(malloc_type)(size_t size);
typedef void *(realloc_type)(void *p, size_t size);
typedef void *(calloc_type)(size_t count, size_t size);
typedef void *(free_type)(void *p);

#define overflow_abort()	{ \
	static char message[] = "overflow in " __FILE__ "\n";		\
	write(2, message, sizeof(message) - 1);				\
	abort();							\
    }

static size_t total_allocated;

size_t
get_total_allocated() {
    size_t allocated;
    mutex_lock_or_bust();
    allocated = total_allocated;;
    mutex_unlock_or_bust();
    return allocated;
}

static malloc_type *real_malloc;
static volatile sig_atomic_t bootstrapping;

void *malloc(size_t size) {
    void *got;
    struct header *header;
    malloc_type *real_malloc_local;

    mutex_lock_or_bust();
    real_malloc_local = real_malloc;
    total_allocated += size;
    mutex_unlock_or_bust();

    if (!real_malloc_local) {
	if (bootstrapping) {
	    /* Khan! dlsym is calling malloc().  x86_64 I love you.
	       MAP_ANON may be deprecated by glibc, but MAP_ANONYMOUS doesn't
	       even exist on FreeBSD. Rah. */
	    got = mmap(NULL, size + ALIGNMENT, PROT_READ|PROT_WRITE,
		       MAP_PRIVATE|MAP_ANON, -1, 0);
	    if (!got) {
		const char message[] = "malloc_intercept mmap failed during bootstrap";
		_mutex_bust(message, sizeof(message) - 1, errno);
	    }
	    header = (struct header *) got;
	    header->sentinel = MMAPPED;
	    header->size = size;

	    return (void *)((char *)got + ALIGNMENT);
	}
	bootstrapping = 1;
	real_malloc_local = (malloc_type *)dlsym_or_bust("malloc");
	mutex_lock_or_bust();
	real_malloc = real_malloc_local;
	mutex_unlock_or_bust();
	
    }

    got = real_malloc_local(size + ALIGNMENT);

    /* Assume success, and fix up on failure, so that we only have to lock the
       mutex once in the regular course of events.  */
    if (!got) {
	mutex_lock_or_bust();
	total_allocated -= size;
	mutex_unlock_or_bust();
	return NULL;
    }

    header = (struct header *) got;
    header->sentinel = SENTINEL;
    header->size = size;


    return (void *)((char *)got + ALIGNMENT);
}

void *realloc(void *p, size_t size) {
    void *got;
    size_t old_size = 0;
    struct header *header;
    static realloc_type *real_realloc;
    void *real_p;

    if (p) {
	real_p = (void *)((char *)p - ALIGNMENT);

	header = (struct header *) real_p;
	if (header->sentinel == MMAPPED) {
	    size_t to_copy = size > header->size ? size : header->size;
	    got = malloc(size);
	    if (got) {
		memcpy(got, p, to_copy);
		free(p);
	    }
	    return got;
	}
	assert(header->sentinel == SENTINEL);
	old_size = header->size;
    } else {
	real_p = NULL;
    }

    mutex_lock_or_bust();
    total_allocated -= old_size;
    total_allocated += size;
    if (!real_realloc) {
	if (!real_malloc && bootstrapping) {
	    /* We could fix this to also bootstrap, but it KISS until we
	       encouter a platform where this matters.  */
	    const char message[] = "malloc_intercept realloc() called while bootstrapping.";
	    write(2, message, sizeof(message) - 1);
	    abort();
	}
	real_realloc = (realloc_type *)dlsym_or_bust("realloc");
    }
    mutex_unlock_or_bust();

    got = real_realloc(real_p, size + ALIGNMENT);

    if (!got) {
	mutex_lock_or_bust();
	total_allocated -= size;
	total_allocated += old_size;
	mutex_unlock_or_bust();
	return NULL;
    }

    header = (struct header *) got;
    header->sentinel = SENTINEL;
    header->size = size;

    total_allocated -= old_size;
    total_allocated += size;

    return (void *)((char *)got + ALIGNMENT);
}

/* This is provided by (at least) FreeBSD and OS X, but there's no harm in
   always providing a wrapper for it, as code that doesn't expect it to be there
   won't call it.  */
void *reallocf(void *p, size_t size) {
    void *got = realloc(p, size);
    if (!got)
	free(p);
    return got;
}

/* This is provided by (at least) OS X, but there's no harm in always providing
   a wrapper for it, as code that doesn't expect it to be there won't call it,
   so won't ever call dylsm trying to find "malloc_size" on a platform without
   it.  */
typedef size_t(malloc_size_type)(void *p);

size_t malloc_size(void *p) {
    static malloc_size_type *real_malloc_size;
    mutex_lock_or_bust();
    if (!real_malloc_size)
	real_malloc_size = (malloc_size_type *)dlsym_or_bust("malloc_size");
    mutex_unlock_or_bust();

    return real_malloc_size((void *)((char *)p - ALIGNMENT)) - ALIGNMENT;
}

/* This is provided by (at least) OS X, and we have no hope of wrapping it, so
   issue a diagnostic and return failure.  */
void *valloc(size_t size) {
    static const char no_can_do[] = "valloc_called\n";
    write(2, no_can_do, sizeof(no_can_do) - 1);
    return NULL;
}

void *calloc(size_t count, size_t size) {
    const size_t want = count * size;
    void *got;

    if(size && count > ((size_t) -1) / size)
	overflow_abort();

    got = malloc(want);
    memset(got, 0, want);

    return got;
}

void
free(void *p) {
    if (p) {
	void *real_p = (void *)((char *)p - ALIGNMENT);
	struct header *header = (struct header *)real_p;
	static free_type *real_free;

	if (header->sentinel == MMAPPED) {
	    munmap(real_p, header->size);
	    return;
	}
	assert(header->sentinel == SENTINEL);

	mutex_lock_or_bust();
	if (!real_free)
	    real_free = (free_type *)dlsym_or_bust("free");
	total_allocated -= header->size;
	mutex_unlock_or_bust();

	real_free(real_p);
    }
}

Reply via email to