Carlo Marcelo Arenas Belon wrote:
> On Sun, Dec 13, 2009 at 10:49:00AM +0000, Daniel Pocock wrote:
>   
>> Carlo Marcelo Arenas Belon wrote:
>>     
>>> On Fri, Dec 11, 2009 at 01:31:22PM -0600, Brooks Davis wrote:
>>>   
>>>       
>>>> On Fri, Dec 11, 2009 at 04:56:51PM +0000, Carlo Marcelo Arenas Belon wrote:
>>>>     
>>>>         
>>>>> I presume the reason why you haven't seen this show up in the APR list, is
>>>>> because it makes probably more sense for the apache httpd list instead for
>>>>> help understanding how apache is able to "work around" the leakiness of
>>>>> apr_poll and that also requires some reading from apache's code (which I
>>>>> am not at least that familiar with, neither really interested)
>>>>>       
>>>>>           
>>>> Looking at the prefork mpm, the pollsets are created and used only
>>>> in child_main() and thus are created after the fork.  I suspect that
>>>> changing the ganglia code to open all the sockets, but defer creation of
>>>> the pollset until after fork is the right way to go.
>>>>         
>>> That is the way we did the initialization before r2025 so I guess that could
>>> explain why we weren't affected just like apache is not.
>>>   
>>>       
>> Not quite - pre-r2025, we did this:
>>
>> a) detach
>> b) socket init
>> c) pollset init
>>
>> Post r2025:
>>
>> a) socket init
>> b) pollset init
>> c) detach
>>
>> Brooks' solution:
>>
>> a) socket init
>> b) detach
>> c) pollset init
>>
>> I could accept Brooks' solution, because it means gmond would only fail  
>> for something like out-of-memory, while any configuration failure, port  
>> in use, etc would cause it to fail before detaching.
>>     
>
> If gmond still fails silently in some cases, you have not accomplished the
> objective that you were trying to obtain with r2025 anyway.
>   
I agree - it doesn't completely meet my goal, but it does at least 
result in an error code for most types of bad configuration (or port in 
use), and it allows us to continue using apr (which some people have 
indicated a preference for).

> The solution I proposed addresses the problem of reporting to the OS any
> failure while initialization (which was the original bug to fix anyway)
> in a straight forward way and is therefore the right way to correct this
> IMHO, without introducing any regressions by changing long relied upon
> semantics.
>
>   
Does anyone else have any feelings about this?  I think we can choose from:

- Carlo's solution (implement apr_proc_detach ourselves, calling process 
hangs around and uses socket to discover if daemon started successfully)

- Brooks' solution (prepare sockets before detaching, prepare pollsets 
after detaching) - this allows us to continue using apr_proc_detach and 
not have native UNIX code

- Revert my change completely

I would like to make some kind of decision about what goes in 3.1.6 
before Christmas, and maybe aim to tag 3.1.6 by 11 January, there is 
also the possibility that we can try to push it out more quickly, maybe 
tagging it 24 December and go GA in mid January?

>> Basically, we would have to split the code in  
>> setup_listen_channels_pollset() into two functions, one that gets called  
>> before detaching, and one that is called after detaching.
>>     
>
> Why make the code more complicated, and are you really expecting to do that
> in scope for getting it backported into 3.1.6 considering how intrusive that
> would be?
>
> Also be aware there are bugfixes on that code that hadn't yet been backported
> and so you are going to either have to certify as well all those fixes or
> cherry pick the changes needed and test all different combinations.
>   
Thanks for pointing that out - we would have to do this all on trunk first

------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Reply via email to