Re: [Linux-ha-dev] [PATCH] cluster-glue memory leak
On Tue, May 07, 2013 at 07:10:15PM +0900, Yuichi SEINO wrote: Hi All, I used pacemaker-1.1.9(commit 138556cb0b375a490a96f35e7fbeccc576a22011) crmd caused a memory leak. And, the memory leak happens in 3 place. I could fix 1 place. So, I attached a patch. However, the rest couldn't be not easy to solve. The issues is that stonith API can't call DelPILPluginUnive function in pils.c. I think that we need to call DelPILPluginUnive function to completely relese a memory which stonith_new function got. Is it just that there is this few bytes that are allocated once, and never freed, or is this a real memleak, that is accumulating more and more bytes during process lifetime? I suspect the former. In which case I doubt it is even worthwhile to try and fix it. Why? because, in that case we basically have: main() { global_variable = malloc(something); endless_loop_that_is_not_expected_to_ever_return(); /* so, ok, we could free(global_variable) here. * but why bother? */ exit(1); } In that pseudo code above, it is easy to fix. In the (over-abstracted) case of PILs, I'm afraid, it's not that easy. And appart from academic correctness, there is no gain from fixing this for the real world. -=- If however we have a *real* memleak, that has to be fixed, of course. Lars I show Valgrind. This is that I can fixed a memory leak. ==3484== 76 bytes in 4 blocks are definitely lost in loss record 94 of 161 ==3484==at 0x4A07A49: malloc (vg_replace_malloc.c:270) ==3484==by 0x373FA417D2: g_malloc (gmem.c:132) ==3484==by 0xA2C2365: external_run_cmd (external.c:767) ==3484==by 0xA2C1AC8: external_getinfo (external.c:598) ==3484==by 0x9EB9B7E: stonith_get_info (stonith.c:327) ==3484==by 0x3F5100744D: stonith_api_device_metadata (st_client.c:1177) ==3484==by 0x3F52407E22: stonith_get_metadata (lrmd_client.c:1478) ==3484==by 0x3F52408DB6: lrmd_api_get_metadata (lrmd_client.c:1736) ==3484==by 0x427FB2: lrm_state_get_metadata (lrm_state.c:555) ==3484==by 0x41F991: get_rsc_metadata (lrm.c:436) ==3484==by 0x41FCD4: get_rsc_restart_list (lrm.c:521) ==3484==by 0x4201B0: append_restart_list (lrm.c:607) ==3484==by 0x420670: build_operation_update (lrm.c:672) ==3484==by 0x425AE1: do_update_resource (lrm.c:1906) ==3484==by 0x42622E: process_lrm_event (lrm.c:2016) ==3484==by 0x41EE10: lrm_op_callback (lrm.c:242) ==3484==by 0x3F52404339: lrmd_dispatch_internal (lrmd_client.c:289) ==3484==by 0x3F524043DF: lrmd_ipc_dispatch (lrmd_client.c:311) ==3484==by 0x3F504308A9: mainloop_gio_callback (mainloop.c:587) ==3484==by 0x373FA38F0D: g_main_context_dispatch (gmain.c:1960) ==3484==by 0x373FA3C937: g_main_context_iterate (gmain.c:2591) ==3484==by 0x373FA3CD54: g_main_loop_run (gmain.c:2799) ==3484==by 0x4055E7: crmd_init (main.c:154) ==3484==by 0x405419: main (main.c:120) I show the rest. ==3484== 13 bytes in 1 blocks are definitely lost in loss record 29 of 161 ==3484==at 0x4A07A49: malloc (vg_replace_malloc.c:270) ==3484==by 0x373FA417D2: g_malloc (gmem.c:132) ==3484==by 0x373FA58F7D: g_strdup (gstrfuncs.c:102) ==3484==by 0x4E67713: InterfaceManager_plugin_init (pils.c:611) ==3484==by 0x4E69C64: NewPILInterfaceUniv (pils.c:1723) ==3484==by 0x4E672DC: NewPILPluginUniv (pils.c:487) ==3484==by 0x9EB8FE3: init_pluginsys (stonith.c:75) ==3484==by 0x9EB90EC: stonith_new (stonith.c:105) ==3484==by 0x3F51008137: get_stonith_provider (st_client.c:1434) ==3484==by 0x3F51006E28: stonith_api_device_metadata (st_client.c:1059) ==3484==by 0x3F52407E22: stonith_get_metadata (lrmd_client.c:1478) ==3484==by 0x3F52408DB6: lrmd_api_get_metadata (lrmd_client.c:1736) ==3484==by 0x427FB2: lrm_state_get_metadata (lrm_state.c:555) ==3484==by 0x41F991: get_rsc_metadata (lrm.c:436) ==3484==by 0x41FCD4: get_rsc_restart_list (lrm.c:521) ==3484==by 0x4201B0: append_restart_list (lrm.c:607) ==3484==by 0x420670: build_operation_update (lrm.c:672) ==3484==by 0x425AE1: do_update_resource (lrm.c:1906) ==3484==by 0x42622E: process_lrm_event (lrm.c:2016) ==3484==by 0x41EE10: lrm_op_callback (lrm.c:242) ==3484==by 0x3F52404339: lrmd_dispatch_internal (lrmd_client.c:289) ==3484==by 0x3F524043DF: lrmd_ipc_dispatch (lrmd_client.c:311) ==3484==by 0x3F504308A9: mainloop_gio_callback (mainloop.c:587) ==3484==by 0x373FA38F0D: g_main_context_dispatch (gmain.c:1960) ==3484==by 0x373FA3C937: g_main_context_iterate (gmain.c:2591) ==3484== 13 bytes in 1 blocks are definitely lost in loss record 28 of 161 ==3484==at 0x4A07A49: malloc (vg_replace_malloc.c:270) ==3484==by 0x373FA417D2: g_malloc (gmem.c:132) ==3484==by 0x373FA58F7D: g_strdup (gstrfuncs.c:102) ==3484==by 0x4E676D2: InterfaceManager_plugin_init (pils.c:606) ==3484==by 0x4E69C64:
Re: [Linux-ha-dev] [PATCH] cluster-glue memory leak
Hi, On Tue, May 07, 2013 at 05:22:24PM +0200, Lars Ellenberg wrote: On Tue, May 07, 2013 at 07:10:15PM +0900, Yuichi SEINO wrote: Hi All, I used pacemaker-1.1.9(commit 138556cb0b375a490a96f35e7fbeccc576a22011) crmd caused a memory leak. And, the memory leak happens in 3 place. I could fix 1 place. So, I attached a patch. However, the rest couldn't be not easy to solve. The issues is that stonith API can't call DelPILPluginUnive function in pils.c. I think that we need to call DelPILPluginUnive function to completely relese a memory which stonith_new function got. Is it just that there is this few bytes that are allocated once, and never freed, or is this a real memleak, that is accumulating more and more bytes during process lifetime? I suspect the former. In which case I doubt it is even worthwhile to try and fix it. Agreed. Though the first leak is not related to PILS. Why? because, in that case we basically have: main() { global_variable = malloc(something); endless_loop_that_is_not_expected_to_ever_return(); /* so, ok, we could free(global_variable) here. * but why bother? */ exit(1); } In that pseudo code above, it is easy to fix. In the (over-abstracted) case of PILs, I'm afraid, it's not that easy. And appart from academic correctness, there is no gain from fixing this for the real world. -=- If however we have a *real* memleak, that has to be fixed, of course. The first one, for which the patch is provided, could be a real memory leak. I'll apply the patch. Many thanks! Cheers, Dejan Lars I show Valgrind. This is that I can fixed a memory leak. ==3484== 76 bytes in 4 blocks are definitely lost in loss record 94 of 161 ==3484==at 0x4A07A49: malloc (vg_replace_malloc.c:270) ==3484==by 0x373FA417D2: g_malloc (gmem.c:132) ==3484==by 0xA2C2365: external_run_cmd (external.c:767) ==3484==by 0xA2C1AC8: external_getinfo (external.c:598) ==3484==by 0x9EB9B7E: stonith_get_info (stonith.c:327) ==3484==by 0x3F5100744D: stonith_api_device_metadata (st_client.c:1177) ==3484==by 0x3F52407E22: stonith_get_metadata (lrmd_client.c:1478) ==3484==by 0x3F52408DB6: lrmd_api_get_metadata (lrmd_client.c:1736) ==3484==by 0x427FB2: lrm_state_get_metadata (lrm_state.c:555) ==3484==by 0x41F991: get_rsc_metadata (lrm.c:436) ==3484==by 0x41FCD4: get_rsc_restart_list (lrm.c:521) ==3484==by 0x4201B0: append_restart_list (lrm.c:607) ==3484==by 0x420670: build_operation_update (lrm.c:672) ==3484==by 0x425AE1: do_update_resource (lrm.c:1906) ==3484==by 0x42622E: process_lrm_event (lrm.c:2016) ==3484==by 0x41EE10: lrm_op_callback (lrm.c:242) ==3484==by 0x3F52404339: lrmd_dispatch_internal (lrmd_client.c:289) ==3484==by 0x3F524043DF: lrmd_ipc_dispatch (lrmd_client.c:311) ==3484==by 0x3F504308A9: mainloop_gio_callback (mainloop.c:587) ==3484==by 0x373FA38F0D: g_main_context_dispatch (gmain.c:1960) ==3484==by 0x373FA3C937: g_main_context_iterate (gmain.c:2591) ==3484==by 0x373FA3CD54: g_main_loop_run (gmain.c:2799) ==3484==by 0x4055E7: crmd_init (main.c:154) ==3484==by 0x405419: main (main.c:120) I show the rest. ==3484== 13 bytes in 1 blocks are definitely lost in loss record 29 of 161 ==3484==at 0x4A07A49: malloc (vg_replace_malloc.c:270) ==3484==by 0x373FA417D2: g_malloc (gmem.c:132) ==3484==by 0x373FA58F7D: g_strdup (gstrfuncs.c:102) ==3484==by 0x4E67713: InterfaceManager_plugin_init (pils.c:611) ==3484==by 0x4E69C64: NewPILInterfaceUniv (pils.c:1723) ==3484==by 0x4E672DC: NewPILPluginUniv (pils.c:487) ==3484==by 0x9EB8FE3: init_pluginsys (stonith.c:75) ==3484==by 0x9EB90EC: stonith_new (stonith.c:105) ==3484==by 0x3F51008137: get_stonith_provider (st_client.c:1434) ==3484==by 0x3F51006E28: stonith_api_device_metadata (st_client.c:1059) ==3484==by 0x3F52407E22: stonith_get_metadata (lrmd_client.c:1478) ==3484==by 0x3F52408DB6: lrmd_api_get_metadata (lrmd_client.c:1736) ==3484==by 0x427FB2: lrm_state_get_metadata (lrm_state.c:555) ==3484==by 0x41F991: get_rsc_metadata (lrm.c:436) ==3484==by 0x41FCD4: get_rsc_restart_list (lrm.c:521) ==3484==by 0x4201B0: append_restart_list (lrm.c:607) ==3484==by 0x420670: build_operation_update (lrm.c:672) ==3484==by 0x425AE1: do_update_resource (lrm.c:1906) ==3484==by 0x42622E: process_lrm_event (lrm.c:2016) ==3484==by 0x41EE10: lrm_op_callback (lrm.c:242) ==3484==by 0x3F52404339: lrmd_dispatch_internal (lrmd_client.c:289) ==3484==by 0x3F524043DF: lrmd_ipc_dispatch (lrmd_client.c:311) ==3484==by 0x3F504308A9: mainloop_gio_callback (mainloop.c:587) ==3484==by 0x373FA38F0D: g_main_context_dispatch (gmain.c:1960) ==3484==by 0x373FA3C937: g_main_context_iterate (gmain.c:2591) ==3484== 13