On Fri, Dec 18, 2009 at 04:18:16PM +0000, Daniel Pocock wrote: > Carlo Marcelo Arenas Belon wrote: >> On Sun, Dec 13, 2009 at 10:49:00AM +0000, Daniel Pocock wrote: >>> >>> I could accept Brooks' solution, because it means gmond would only >>> fail for something like out-of-memory, while any configuration >>> failure, port in use, etc would cause it to fail before detaching. >> >> If gmond still fails silently in some cases, you have not accomplished the >> objective that you were trying to obtain with r2025 anyway. >> > I agree - it doesn't completely meet my goal, but it does at least > result in an error code for most types of bad configuration (or port in > use)
that part is OK, but you still have the added sideeffects of r2025 which would affect gmond in other interesting ways : * the metric (and module) initialization is now done by the parent and expected to be inherited by the child, this means for example that the parent will send (and receive) metric information (even before forking) * the suid is done by the parent and therefore the child isn't privileged (while the metric initialization was done as root), this would at least prevent anyone to bind gmond to privileged ports but also could result in complicated permission issues by metric collection scripts. as I said before I think the apr_poll issue with BSD should be taken as a warning of how the changes we were planning to do could have unintended sideeffects, and since moving the daemonization was only one way to solve the original problem, makes more sense to instead revert this change and evaluate alternatives. > and it allows us to continue using apr (which some people have > indicated a preference for). the solution I proposed doesn't remove the apr dependency, just doesn't use it for this specific case, because it is obvious it doesn't fit for what we need to, and we gain otherwise nothing from it (unless we would have a windows native version of gmond) it was also meant to be a "temporary" solution and the "minimum change" needed so that we can have : * 3.1.6 released quickly * the bug you were trying to solve still fixed for 3.1.6 ideally we should be able to make this work through apr in the long run (even if that means fixing apr), or if that is not possible rely on posix itself for getting windows compatibility for this part whenever the time comes to do that. >> The solution I proposed addresses the problem of reporting to the OS any >> failure while initialization (which was the original bug to fix anyway) >> in a straight forward way and is therefore the right way to correct this >> IMHO, without introducing any regressions by changing long relied upon >> semantics. >> > Does anyone else have any feelings about this? I think we can choose from: > > - Carlo's solution (implement apr_proc_detach ourselves, calling process > hangs around and uses socket to discover if daemon started successfully) not a socket but a pipe. > - Brooks' solution (prepare sockets before detaching, prepare pollsets > after detaching) - this allows us to continue using apr_proc_detach and > not have native UNIX code this should work fine too (after all was the proposed option 3), but is really a fix for the bug introduced with r2025, instead of a fix to the original bug, hence why I don't really see how we can compare them both side by side. > - Revert my change completely this was my suggestion for 3.1.6, so at least we will have a working gmond faster and be able to stabilize (both trunk and 3.1) further. since we haven't done this yet, testing any other changes in both trunk and 3.1 is impossible in BSD, and we had therefore implicitally dropped support for those platforms. > I would like to make some kind of decision about what goes in 3.1.6 > before Christmas, and maybe aim to tag 3.1.6 by 11 January, there is > also the possibility that we can try to push it out more quickly, maybe > tagging it 24 December and go GA in mid January? timeline will of course depend on the amount of changes involved, I am afraid also there has been almost no dialogue about the other showstoppers for 3.1.6 (like the bootstrapping issue) so there might be additional complications for this (I was indeed preparing some more build fixes to prevent more regressions if the original plan shown of using Fedora 9 with 3.1.5 are still in effect) Carlo ------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ Ganglia-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ganglia-developers
