On Sun, Dec 13, 2009 at 10:49:00AM +0000, Daniel Pocock wrote: > Carlo Marcelo Arenas Belon wrote: >> On Fri, Dec 11, 2009 at 01:31:22PM -0600, Brooks Davis wrote: >> >>> On Fri, Dec 11, 2009 at 04:56:51PM +0000, Carlo Marcelo Arenas Belon wrote: >>> >>>> I presume the reason why you haven't seen this show up in the APR list, is >>>> because it makes probably more sense for the apache httpd list instead for >>>> help understanding how apache is able to "work around" the leakiness of >>>> apr_poll and that also requires some reading from apache's code (which I >>>> am not at least that familiar with, neither really interested) >>>> >>> Looking at the prefork mpm, the pollsets are created and used only >>> in child_main() and thus are created after the fork. I suspect that >>> changing the ganglia code to open all the sockets, but defer creation of >>> the pollset until after fork is the right way to go. >> >> That is the way we did the initialization before r2025 so I guess that could >> explain why we weren't affected just like apache is not. >> > Not quite - pre-r2025, we did this: > > a) detach > b) socket init > c) pollset init > > Post r2025: > > a) socket init > b) pollset init > c) detach > > Brooks' solution: > > a) socket init > b) detach > c) pollset init > > I could accept Brooks' solution, because it means gmond would only fail > for something like out-of-memory, while any configuration failure, port > in use, etc would cause it to fail before detaching.
If gmond still fails silently in some cases, you have not accomplished the objective that you were trying to obtain with r2025 anyway. The solution I proposed addresses the problem of reporting to the OS any failure while initialization (which was the original bug to fix anyway) in a straight forward way and is therefore the right way to correct this IMHO, without introducing any regressions by changing long relied upon semantics. > Basically, we would have to split the code in > setup_listen_channels_pollset() into two functions, one that gets called > before detaching, and one that is called after detaching. Why make the code more complicated, and are you really expecting to do that in scope for getting it backported into 3.1.6 considering how intrusive that would be? Also be aware there are bugfixes on that code that hadn't yet been backported and so you are going to either have to certify as well all those fixes or cherry pick the changes needed and test all different combinations. Carlo ------------------------------------------------------------------------------ Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev _______________________________________________ Ganglia-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ganglia-developers
