Re: [Linux-ha-dev] [PATCH] cluster-glue memory leak

2013-05-07 Thread Lars Ellenberg
On Tue, May 07, 2013 at 07:10:15PM +0900, Yuichi SEINO wrote:
 Hi All,
 
 I used pacemaker-1.1.9(commit 138556cb0b375a490a96f35e7fbeccc576a22011)
 
 crmd caused a memory leak. And, the memory leak happens in 3 place.
 I could fix 1 place. So, I attached a patch.
 
 However,  the rest couldn't be not easy to solve. The issues is that
 stonith API can't call DelPILPluginUnive function in pils.c.  I think
 that we need to call DelPILPluginUnive function to completely relese a
 memory which stonith_new function got.

Is it just that there is this few bytes that are allocated once,
and never freed, or is this a real memleak,
that is accumulating more and more bytes during process lifetime?

I suspect the former.
In which case I doubt it is even worthwhile to try and fix it.

Why?
because, in that case we basically have:
main()
{
global_variable = malloc(something);
endless_loop_that_is_not_expected_to_ever_return();
/* so, ok, we could free(global_variable) here.
 * but why bother?  */
exit(1);
}

In that pseudo code above, it is easy to fix.
In the (over-abstracted) case of PILs, I'm afraid, it's not that easy.
And appart from academic correctness,
there is no gain from fixing this for the real world.

 -=-

If however we have a *real* memleak, that has to be fixed, of course.

Lars

 I show Valgrind. This is that I can fixed a memory leak.
 
 ==3484== 76 bytes in 4 blocks are definitely lost in loss record 94 of 161
 ==3484==at 0x4A07A49: malloc (vg_replace_malloc.c:270)
 ==3484==by 0x373FA417D2: g_malloc (gmem.c:132)
 ==3484==by 0xA2C2365: external_run_cmd (external.c:767)
 ==3484==by 0xA2C1AC8: external_getinfo (external.c:598)
 ==3484==by 0x9EB9B7E: stonith_get_info (stonith.c:327)
 ==3484==by 0x3F5100744D: stonith_api_device_metadata (st_client.c:1177)
 ==3484==by 0x3F52407E22: stonith_get_metadata (lrmd_client.c:1478)
 ==3484==by 0x3F52408DB6: lrmd_api_get_metadata (lrmd_client.c:1736)
 ==3484==by 0x427FB2: lrm_state_get_metadata (lrm_state.c:555)
 ==3484==by 0x41F991: get_rsc_metadata (lrm.c:436)
 ==3484==by 0x41FCD4: get_rsc_restart_list (lrm.c:521)
 ==3484==by 0x4201B0: append_restart_list (lrm.c:607)
 ==3484==by 0x420670: build_operation_update (lrm.c:672)
 ==3484==by 0x425AE1: do_update_resource (lrm.c:1906)
 ==3484==by 0x42622E: process_lrm_event (lrm.c:2016)
 ==3484==by 0x41EE10: lrm_op_callback (lrm.c:242)
 ==3484==by 0x3F52404339: lrmd_dispatch_internal (lrmd_client.c:289)
 ==3484==by 0x3F524043DF: lrmd_ipc_dispatch (lrmd_client.c:311)
 ==3484==by 0x3F504308A9: mainloop_gio_callback (mainloop.c:587)
 ==3484==by 0x373FA38F0D: g_main_context_dispatch (gmain.c:1960)
 ==3484==by 0x373FA3C937: g_main_context_iterate (gmain.c:2591)
 ==3484==by 0x373FA3CD54: g_main_loop_run (gmain.c:2799)
 ==3484==by 0x4055E7: crmd_init (main.c:154)
 ==3484==by 0x405419: main (main.c:120)
 
 I show the rest.
 
 ==3484== 13 bytes in 1 blocks are definitely lost in loss record 29 of 161
 ==3484==at 0x4A07A49: malloc (vg_replace_malloc.c:270)
 ==3484==by 0x373FA417D2: g_malloc (gmem.c:132)
 ==3484==by 0x373FA58F7D: g_strdup (gstrfuncs.c:102)
 ==3484==by 0x4E67713: InterfaceManager_plugin_init (pils.c:611)
 ==3484==by 0x4E69C64: NewPILInterfaceUniv (pils.c:1723)
 ==3484==by 0x4E672DC: NewPILPluginUniv (pils.c:487)
 ==3484==by 0x9EB8FE3: init_pluginsys (stonith.c:75)
 ==3484==by 0x9EB90EC: stonith_new (stonith.c:105)
 ==3484==by 0x3F51008137: get_stonith_provider (st_client.c:1434)
 ==3484==by 0x3F51006E28: stonith_api_device_metadata (st_client.c:1059)
 ==3484==by 0x3F52407E22: stonith_get_metadata (lrmd_client.c:1478)
 ==3484==by 0x3F52408DB6: lrmd_api_get_metadata (lrmd_client.c:1736)
 ==3484==by 0x427FB2: lrm_state_get_metadata (lrm_state.c:555)
 ==3484==by 0x41F991: get_rsc_metadata (lrm.c:436)
 ==3484==by 0x41FCD4: get_rsc_restart_list (lrm.c:521)
 ==3484==by 0x4201B0: append_restart_list (lrm.c:607)
 ==3484==by 0x420670: build_operation_update (lrm.c:672)
 ==3484==by 0x425AE1: do_update_resource (lrm.c:1906)
 ==3484==by 0x42622E: process_lrm_event (lrm.c:2016)
 ==3484==by 0x41EE10: lrm_op_callback (lrm.c:242)
 ==3484==by 0x3F52404339: lrmd_dispatch_internal (lrmd_client.c:289)
 ==3484==by 0x3F524043DF: lrmd_ipc_dispatch (lrmd_client.c:311)
 ==3484==by 0x3F504308A9: mainloop_gio_callback (mainloop.c:587)
 ==3484==by 0x373FA38F0D: g_main_context_dispatch (gmain.c:1960)
 ==3484==by 0x373FA3C937: g_main_context_iterate (gmain.c:2591)
 
 ==3484== 13 bytes in 1 blocks are definitely lost in loss record 28 of 161
 ==3484==at 0x4A07A49: malloc (vg_replace_malloc.c:270)
 ==3484==by 0x373FA417D2: g_malloc (gmem.c:132)
 ==3484==by 0x373FA58F7D: g_strdup (gstrfuncs.c:102)
 ==3484==by 0x4E676D2: InterfaceManager_plugin_init (pils.c:606)
 ==3484==by 0x4E69C64: 

Re: [Linux-ha-dev] [PATCH] cluster-glue memory leak

2013-05-07 Thread Dejan Muhamedagic
Hi,

On Tue, May 07, 2013 at 05:22:24PM +0200, Lars Ellenberg wrote:
 On Tue, May 07, 2013 at 07:10:15PM +0900, Yuichi SEINO wrote:
  Hi All,
  
  I used pacemaker-1.1.9(commit 138556cb0b375a490a96f35e7fbeccc576a22011)
  
  crmd caused a memory leak. And, the memory leak happens in 3 place.
  I could fix 1 place. So, I attached a patch.
  
  However,  the rest couldn't be not easy to solve. The issues is that
  stonith API can't call DelPILPluginUnive function in pils.c.  I think
  that we need to call DelPILPluginUnive function to completely relese a
  memory which stonith_new function got.
 
 Is it just that there is this few bytes that are allocated once,
 and never freed, or is this a real memleak,
 that is accumulating more and more bytes during process lifetime?
 
 I suspect the former.
 In which case I doubt it is even worthwhile to try and fix it.

Agreed. Though the first leak is not related to PILS.

 Why?
 because, in that case we basically have:
 main()
 {
   global_variable = malloc(something);
   endless_loop_that_is_not_expected_to_ever_return();
   /* so, ok, we could free(global_variable) here.
* but why bother?  */
   exit(1);
 }
 
 In that pseudo code above, it is easy to fix.
 In the (over-abstracted) case of PILs, I'm afraid, it's not that easy.
 And appart from academic correctness,
 there is no gain from fixing this for the real world.
 
  -=-
 
 If however we have a *real* memleak, that has to be fixed, of course.

The first one, for which the patch is provided, could be a real
memory leak. I'll apply the patch. Many thanks!

Cheers,

Dejan

   Lars
 
  I show Valgrind. This is that I can fixed a memory leak.
  
  ==3484== 76 bytes in 4 blocks are definitely lost in loss record 94 of 161
  ==3484==at 0x4A07A49: malloc (vg_replace_malloc.c:270)
  ==3484==by 0x373FA417D2: g_malloc (gmem.c:132)
  ==3484==by 0xA2C2365: external_run_cmd (external.c:767)
  ==3484==by 0xA2C1AC8: external_getinfo (external.c:598)
  ==3484==by 0x9EB9B7E: stonith_get_info (stonith.c:327)
  ==3484==by 0x3F5100744D: stonith_api_device_metadata (st_client.c:1177)
  ==3484==by 0x3F52407E22: stonith_get_metadata (lrmd_client.c:1478)
  ==3484==by 0x3F52408DB6: lrmd_api_get_metadata (lrmd_client.c:1736)
  ==3484==by 0x427FB2: lrm_state_get_metadata (lrm_state.c:555)
  ==3484==by 0x41F991: get_rsc_metadata (lrm.c:436)
  ==3484==by 0x41FCD4: get_rsc_restart_list (lrm.c:521)
  ==3484==by 0x4201B0: append_restart_list (lrm.c:607)
  ==3484==by 0x420670: build_operation_update (lrm.c:672)
  ==3484==by 0x425AE1: do_update_resource (lrm.c:1906)
  ==3484==by 0x42622E: process_lrm_event (lrm.c:2016)
  ==3484==by 0x41EE10: lrm_op_callback (lrm.c:242)
  ==3484==by 0x3F52404339: lrmd_dispatch_internal (lrmd_client.c:289)
  ==3484==by 0x3F524043DF: lrmd_ipc_dispatch (lrmd_client.c:311)
  ==3484==by 0x3F504308A9: mainloop_gio_callback (mainloop.c:587)
  ==3484==by 0x373FA38F0D: g_main_context_dispatch (gmain.c:1960)
  ==3484==by 0x373FA3C937: g_main_context_iterate (gmain.c:2591)
  ==3484==by 0x373FA3CD54: g_main_loop_run (gmain.c:2799)
  ==3484==by 0x4055E7: crmd_init (main.c:154)
  ==3484==by 0x405419: main (main.c:120)
  
  I show the rest.
  
  ==3484== 13 bytes in 1 blocks are definitely lost in loss record 29 of 161
  ==3484==at 0x4A07A49: malloc (vg_replace_malloc.c:270)
  ==3484==by 0x373FA417D2: g_malloc (gmem.c:132)
  ==3484==by 0x373FA58F7D: g_strdup (gstrfuncs.c:102)
  ==3484==by 0x4E67713: InterfaceManager_plugin_init (pils.c:611)
  ==3484==by 0x4E69C64: NewPILInterfaceUniv (pils.c:1723)
  ==3484==by 0x4E672DC: NewPILPluginUniv (pils.c:487)
  ==3484==by 0x9EB8FE3: init_pluginsys (stonith.c:75)
  ==3484==by 0x9EB90EC: stonith_new (stonith.c:105)
  ==3484==by 0x3F51008137: get_stonith_provider (st_client.c:1434)
  ==3484==by 0x3F51006E28: stonith_api_device_metadata (st_client.c:1059)
  ==3484==by 0x3F52407E22: stonith_get_metadata (lrmd_client.c:1478)
  ==3484==by 0x3F52408DB6: lrmd_api_get_metadata (lrmd_client.c:1736)
  ==3484==by 0x427FB2: lrm_state_get_metadata (lrm_state.c:555)
  ==3484==by 0x41F991: get_rsc_metadata (lrm.c:436)
  ==3484==by 0x41FCD4: get_rsc_restart_list (lrm.c:521)
  ==3484==by 0x4201B0: append_restart_list (lrm.c:607)
  ==3484==by 0x420670: build_operation_update (lrm.c:672)
  ==3484==by 0x425AE1: do_update_resource (lrm.c:1906)
  ==3484==by 0x42622E: process_lrm_event (lrm.c:2016)
  ==3484==by 0x41EE10: lrm_op_callback (lrm.c:242)
  ==3484==by 0x3F52404339: lrmd_dispatch_internal (lrmd_client.c:289)
  ==3484==by 0x3F524043DF: lrmd_ipc_dispatch (lrmd_client.c:311)
  ==3484==by 0x3F504308A9: mainloop_gio_callback (mainloop.c:587)
  ==3484==by 0x373FA38F0D: g_main_context_dispatch (gmain.c:1960)
  ==3484==by 0x373FA3C937: g_main_context_iterate (gmain.c:2591)
  
  ==3484== 13