----- Original Message ---- > From: Kumar Vaibhav <[EMAIL PROTECTED]> > To: Jesse Becker <[EMAIL PROTECTED]> > Cc: Martin Knoblauch <[EMAIL PROTECTED]>; Ganglia Developers > <ganglia-developers@lists.sourceforge.net>; Bernard Li <[EMAIL PROTECTED]> > Sent: Friday, March 21, 2008 8:16:42 AM > Subject: Re: [Ganglia-developers] Memory leak in gmond > > Hi All, > > I am still seeing some memory leak in the nodes > Now the problem is not in the deaf mode but in the mute mode. To reduce the > debugging complexity I am running the 3.0.7 > on 2 nodes one in deaf mode and other in mute mode. The deaf mode is working > fine and the node in mute mode is giving > memory leak. Here is the o/p of the valgrind for the node with mute mode. > Hi Kumar,
while I almost assume that some/most of the "leaks" that you are seeing are one-time allocations that just live until process-end, I am at least confused about the ones from "hash_lookup". This is part of a metrics sampling function which should not be called at all in "mute" mode - unless I am not completely wrong. Could you do the valgrind runs twice, with different total run-times. Just to see which of the "leaks" accumulate. > > ==21588== > ==21588== Process terminating with default action of signal 2 (SIGINT) > ==21588== at 0x3F810C485F: poll (in /lib64/libc-2.5.so) > ==21588== by 0x41D7B1: apr_pollset_poll (poll.c:504) > ==21588== by 0x405846: main (gmond.c:1269) > --21588-- Discarding syms at 0x4D41000-0x4F4C000 in > /lib64/libnss_files-2.5.so > due to munmap() > ==21588== > ==21588== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 5 from 1) > --21588-- > --21588-- supp: 5 Fedora-Core-6-hack3-ld25 > ==21588== malloc/free: in use at exit: 740,602 bytes in 1,190 blocks. > ==21588== malloc/free: 2,574 allocs, 1,384 frees, 946,209 bytes allocated. > ==21588== > ==21588== searching for pointers to 1,190 not-freed blocks. > ==21588== checked 479,904 bytes. > ==21588== > ==21588== 5 bytes in 1 blocks are still reachable in loss record 1 of 16 > ==21588== at 0x4A05809: malloc (vg_replace_malloc.c:149) > ==21588== by 0x4111FF: cfg_init (confuse.c:1087) > ==21588== by 0x40EB7C: Ganglia_gmond_config_create (libgmond.c:523) > ==21588== by 0x405529: process_configuration_file (gmond.c:180) > ==21588== by 0x405627: main (gmond.c:1815) > ==21588== I think this is a one-time alloc from reading the config file. > ==21588== > ==21588== 19 bytes in 4 blocks are still reachable in loss record 2 of 16 > ==21588== at 0x4A05809: malloc (vg_replace_malloc.c:149) > ==21588== by 0x3F810750E1: strndup (in /lib64/libc-2.5.so) > ==21588== by 0x40806A: hash_lookup (metrics.c:151) > ==21588== by 0x408D75: bytes_out_func (metrics.c:425) > ==21588== by 0x40418C: Ganglia_collection_group_collect (gmond.c:1540) > ==21588== by 0x404FC8: process_collection_groups (gmond.c:1662) > ==21588== by 0x40600E: main (gmond.c:1913) > ==21588== Now, this one is from bytes_out_func. Likely a one-time allcation. How many network interfaces has that system got? What are they named? And I wonder why it is called at all in mute mode. > ==21588== > ==21588== 22 bytes in 2 blocks are still reachable in loss record 3 of 16 > ==21588== at 0x4A05809: malloc (vg_replace_malloc.c:149) > ==21588== by 0x406740: gengetopt_strdup (cmdline.c:64) > ==21588== by 0x40689E: cmdline_parser (cmdline.c:100) > ==21588== by 0x4055BD: main (gmond.c:1780) > ==21588== One-time allocation. > ==21588== > ==21588== 56 bytes in 1 blocks are still reachable in loss record 4 of 16 > ==21588== at 0x4A05809: malloc (vg_replace_malloc.c:149) > ==21588== by 0x4111D2: cfg_init (confuse.c:1083) > ==21588== by 0x40EB7C: Ganglia_gmond_config_create (libgmond.c:523) > ==21588== by 0x405529: process_configuration_file (gmond.c:180) > ==21588== by 0x405627: main (gmond.c:1815) > ==21588== One-time allocation. > ==21588== > ==21588== 192 bytes in 4 blocks are still reachable in loss record 5 of 16 > ==21588== at 0x4A05809: malloc (vg_replace_malloc.c:149) > ==21588== by 0x408057: hash_lookup (metrics.c:144) > ==21588== by 0x408D75: bytes_out_func (metrics.c:425) > ==21588== by 0x40418C: Ganglia_collection_group_collect (gmond.c:1540) > ==21588== by 0x404FC8: process_collection_groups (gmond.c:1662) > ==21588== by 0x40600E: main (gmond.c:1913) > ==21588== See my comment above. That looks like 4 net_dev_stats structures. Likely one-time allcations. But should not happen at all in "mute" mode. Are you running in 32-bit or 64-bit mode? Seems we can save 8-bytes per struct by better sorting the members. > ==21588== > ==21588== 192 bytes in 1 blocks are still reachable in loss record 6 of 16 > ==21588== at 0x4A05809: malloc (vg_replace_malloc.c:149) > ==21588== by 0x41BDC1: apr_allocator_create (apr_pools.c:90) > ==21588== by 0x41C55C: apr_pool_initialize (apr_pools.c:506) > ==21588== by 0x41A7C4: apr_initialize (start.c:55) > ==21588== by 0x40EC9F: Ganglia_pool_create (libgmond.c:494) > ==21588== by 0x4055DA: main (gmond.c:1789) > ==21588== Likely one-time allocation. > ==21588== > ==21588== 322 bytes in 48 blocks are still reachable in loss record 7 of 16 > ==21588== at 0x4A05809: malloc (vg_replace_malloc.c:149) > ==21588== by 0x3F810F5A63: xdr_string (in /lib64/libc-2.5.so) > ==21588== by 0x40DA67: xdr_Ganglia_message (protocol_xdr.c:124) > ==21588== by 0x4047C3: process_udp_recv_channel (gmond.c:905) > ==21588== by 0x4059FC: main (gmond.c:1279) > ==21588== No idea > ==21588== > ==21588== 328 bytes in 8 blocks are still reachable in loss record 8 of 16 > ==21588== at 0x4A0590B: realloc (vg_replace_malloc.c:306) > ==21588== by 0x40FCAB: cfg_addval (confuse.c:372) > ==21588== by 0x411397: cfg_setopt (confuse.c:587) > ==21588== by 0x410B7A: cfg_parse_internal (confuse.c:938) > ==21588== by 0x410B9F: cfg_parse_internal (confuse.c:944) > ==21588== by 0x410EC3: cfg_parse_fp (confuse.c:1035) > ==21588== by 0x410F9D: cfg_parse (confuse.c:1054) > ==21588== by 0x40EB8A: Ganglia_gmond_config_create (libgmond.c:525) > ==21588== by 0x405529: process_configuration_file (gmond.c:180) > ==21588== by 0x405627: main (gmond.c:1815) > ==21588== One-time allcation from processing the config file. > ==21588== > ==21588== 1,128 bytes in 141 blocks are still reachable in loss record 9 of 16 > ==21588== at 0x4A05809: malloc (vg_replace_malloc.c:149) > ==21588== by 0x4A05883: realloc (vg_replace_malloc.c:306) > ==21588== by 0x40FCAB: cfg_addval (confuse.c:372) > ==21588== by 0x411397: cfg_setopt (confuse.c:587) > ==21588== by 0x4110D7: cfg_init_defaults (confuse.c:529) > ==21588== by 0x411251: cfg_init (confuse.c:1094) > ==21588== by 0x40EB7C: Ganglia_gmond_config_create (libgmond.c:523) > ==21588== by 0x405529: process_configuration_file (gmond.c:180) > ==21588== by 0x405627: main (gmond.c:1815) > ==21588== ditto > ==21588== > ==21588== 1,143 bytes in 180 blocks are definitely lost in loss record 10 of > 16 > ==21588== at 0x4A05809: malloc (vg_replace_malloc.c:149) > ==21588== by 0x3F810F5A63: xdr_string (in /lib64/libc-2.5.so) > ==21588== by 0x40D87D: xdr_Ganglia_gmetric_message (protocol_xdr.c:23) > ==21588== by 0x40D9FD: xdr_Ganglia_message (protocol_xdr.c:83) > ==21588== by 0x4047C3: process_udp_recv_channel (gmond.c:905) > ==21588== by 0x4059FC: main (gmond.c:1279) > ==21588== no idea > ==21588== > ==21588== 1,456 bytes in 182 blocks are still reachable in loss record 11 of > 16 > ==21588== at 0x4A05809: malloc (vg_replace_malloc.c:149) > ==21588== by 0x40FCC5: cfg_addval (confuse.c:375) > ==21588== by 0x411397: cfg_setopt (confuse.c:587) > ==21588== by 0x4110D7: cfg_init_defaults (confuse.c:529) > ==21588== by 0x411251: cfg_init (confuse.c:1094) > ==21588== by 0x40EB7C: Ganglia_gmond_config_create (libgmond.c:523) > ==21588== by 0x405529: process_configuration_file (gmond.c:180) > ==21588== by 0x405627: main (gmond.c:1815) > ==21588== One-time allocation > ==21588== > ==21588== 2,912 bytes in 52 blocks are still reachable in loss record 12 of 16 > ==21588== at 0x4A05809: malloc (vg_replace_malloc.c:149) > ==21588== by 0x41151E: cfg_setopt (confuse.c:665) > ==21588== by 0x4110D7: cfg_init_defaults (confuse.c:529) > ==21588== by 0x411251: cfg_init (confuse.c:1094) > ==21588== by 0x40EB7C: Ganglia_gmond_config_create (libgmond.c:523) > ==21588== by 0x405529: process_configuration_file (gmond.c:180) > ==21588== by 0x405627: main (gmond.c:1815) > ==21588== ditto > ==21588== > ==21588== 4,123 bytes in 401 blocks are still reachable in loss record 13 of > 16 > ==21588== at 0x4A05809: malloc (vg_replace_malloc.c:149) > ==21588== by 0x3F81075081: strdup (in /lib64/libc-2.5.so) > ==21588== by 0x410218: cfg_dupopt_array (confuse.c:401) > ==21588== by 0x41122B: cfg_init (confuse.c:1088) > ==21588== by 0x40EB7C: Ganglia_gmond_config_create (libgmond.c:523) > ==21588== by 0x405529: process_configuration_file (gmond.c:180) > ==21588== by 0x405627: main (gmond.c:1815) > ==21588== ditto > ==21588== > ==21588== 16,384 bytes in 2 blocks are still reachable in loss record 14 of 16 > ==21588== at 0x4A05809: malloc (vg_replace_malloc.c:149) > ==21588== by 0x41C7C0: apr_palloc (apr_pools.c:293) > ==21588== by 0x403E59: Ganglia_metric_cb_define (gmond.c:1306) > ==21588== by 0x403F08: setup_metric_callbacks (gmond.c:1367) > ==21588== by 0x4056EE: main (gmond.c:1845) > ==21588== hmm. I just wonder why we do this in "mute" mode. > ==21588== > ==21588== 40,576 bytes in 81 blocks are still reachable in loss record 15 of > 16 > ==21588== at 0x4A05809: malloc (vg_replace_malloc.c:149) > ==21588== by 0x4101BB: cfg_dupopt_array (confuse.c:395) > ==21588== by 0x41122B: cfg_init (confuse.c:1088) > ==21588== by 0x40EB7C: Ganglia_gmond_config_create (libgmond.c:523) > ==21588== by 0x405529: process_configuration_file (gmond.c:180) > ==21588== by 0x405627: main (gmond.c:1815) > ==21588== One-time allocation. Rather big, but likely not serious > ==21588== > ==21588== 671,744 bytes in 82 blocks are still reachable in loss record 16 of > 16 > ==21588== at 0x4A05809: malloc (vg_replace_malloc.c:149) > ==21588== by 0x41C2C3: apr_pool_create_ex (apr_pools.c:293) > ==21588== by 0x41C586: apr_pool_initialize (apr_pools.c:511) > ==21588== by 0x41A7C4: apr_initialize (start.c:55) > ==21588== by 0x40EC9F: Ganglia_pool_create (libgmond.c:494) > ==21588== by 0x4055DA: main (gmond.c:1789) > ==21588== no idea, seems to come from initialization. > ==21588== LEAK SUMMARY: > ==21588== definitely lost: 1,143 bytes in 180 blocks. > ==21588== possibly lost: 0 bytes in 0 blocks. > ==21588== still reachable: 739,459 bytes in 1,010 blocks. > ==21588== suppressed: 0 bytes in 0 blocks. > --21588-- memcheck: sanity checks: 46 cheap, 2 expensive > --21588-- memcheck: auxmaps: 301 auxmap entries (19264k, 18M) in use > --21588-- memcheck: auxmaps: 4427566 searches, 5961364 comparisons > --21588-- memcheck: SMs: n_issued = 41 (656k, 0M) > --21588-- memcheck: SMs: n_deissued = 0 (0k, 0M) > --21588-- memcheck: SMs: max_noaccess = 524287 (8388592k, 8191M) > --21588-- memcheck: SMs: max_undefined = 0 (0k, 0M) > --21588-- memcheck: SMs: max_defined = 387 (6192k, 6M) > --21588-- memcheck: SMs: max_non_DSM = 41 (656k, 0M) > --21588-- memcheck: max sec V bit nodes: 3 (0k, 0M) > --21588-- memcheck: set_sec_vbits8 calls: 3 (new: 3, updates: 0) > --21588-- memcheck: max shadow mem size: 4800k, 4M > --21588-- translate: fast SP updates identified: 5,079 ( 86.9%) > --21588-- translate: generic_known SP updates identified: 607 ( 10.3%) > --21588-- translate: generic_unknown SP updates identified: 153 ( 2.6%) > --21588-- tt/tc: 23,734 tt lookups requiring 24,620 probes > --21588-- tt/tc: 23,734 fast-cache updates, 6 flushes > --21588-- transtab: new 5,719 (136,749 -> 2,554,493; ratio 186:10) [0 > scs] > --21588-- transtab: dumped 0 (0 -> ??) > --21588-- transtab: discarded 153 (2,818 -> ??) > --21588-- scheduler: 4,609,790 jumps (bb entries). > --21588-- scheduler: 46/39,509 major/minor sched events. > --21588-- sanity: 47 cheap, 2 expensive checks. > --21588-- exectx: 30,011 lists, 203 contexts (avg 0 per list) > --21588-- exectx: 3,963 searches, 5,723 full compares (1,444 per 1000) > --21588-- exectx: 4,421 cmp2, 10 cmp4, 0 cmpAll > > > > Jesse Becker wrote: > > On Feb 19, 2008 7:39 PM, Martin Knoblauch wrote: > >> ----- Original Message ---- > >>> From: Jesse Becker > >>> To: Ganglia Developers > >>> Sent: Tuesday, February 19, 2008 11:25:54 PM > >>> Subject: Re: [Ganglia-developers] Memory leak in gmond > >>> > >>> I'm not sure if this is right--I've only take a really quick check in > >>> libmetrics/linux/metrics.c, and my C-fu is rusty. > >>> > >>> It looks like strndup() is called in linux/metrics.c:hash_lookup > >>> (about line 131) to dupliate an interface name, which is included in > >>> the stats structure as stats->name. The net_dev_stats function will > >>> return this struct. > >>> > >>> The function is called in a number of places pkts_in_func, > >>> pkts_out_func, bytes_out_func and bytes_in_func. The variable "*ns" > >>> is assigned the output of hash_lookup (e.g. the struct). Since the > >>> 'name' element is malloc()ed, but not explictly freed, it will not go > >>> away when *ns goes out of scope. This is the leak, isn't it? All > >>> four of these functions are very similar, and need to be fixed if this > >>> is the case. > >>> > >>> Or did I miss something obvious? :) > >>> > >> Lines 137, 148 and 159 ? :-) > > > > I saw those. :-P I meant after the struct has been returned, outside > > the function, the memory is never freed. Inside that function, it's > > okay. > > > >> The memory allocated in line 151 is never freed, indeed. But it is only > >> allocated once per interface and stays alive for the entire lifetime of the > >> gmond process. So, it is not leaked. > > > > Ah, that makes more sense, especially if those variables exist for the > > lifetime of the program. > > > > So, I've just run gmond under valgrind and duma (a fork of the old > > Electric Fence memory debugger), and I can't seem to reproduce the > > problem now. Neither one of them is showing any obvious leaks, at > > least not in the 15 minute tests I've run. The test system(s) are > > CentOS4.6 boxes. > > > > > > > ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers