>>> On 7/11/2008 at 1:42 PM, in message <[EMAIL PROTECTED]>, Carlo
Marcelo Arenas Belon <[EMAIL PROTECTED]> wrote:
> On Fri, Jul 11, 2008 at 12:24:01PM -0600, Brad Nicholes wrote:
>> >>> On 7/11/2008 at 11:15 AM, in message <[EMAIL PROTECTED]>, Carlo
>> Marcelo Arenas Belon <[EMAIL PROTECTED]> wrote:
>> 
>> I guess I would just rather see it distributed so that the user can decide
>> what they want to do rather than us making the decision for them.
> 
> and I agree with you on that, the only difference of opinions comes on how
> to distribute that and if that is feasible now (see below).
> 
>> > My suggestion was to make a file name change as well into the contrib
>> > directory, where it won't get in the way and will be also available for
>> > those that want to use it, but since there is no contrib yet distributed
>> > then cleanly removing it (it will be available from our repository in
>> > the web anyway for whoever wants to install it) looks like the best next
>> > option.
>> 
>> I would agree as well if we had a contrib/ directory.  But just because
>> we don't should not mean that we remove it completely and make it
>> unavailable for those that would still like to use it.
> 
> there is also the possibility of just adding the "contrib" into this first
> release and using instead that (which should be safe enough) and has been
> already voted for backport (but for the next release).
> 
> feel free to commit that then and base disabling this metric / documentation
> on the contrib directory which should satisfy all raised concerns.
> 
> if you are going that route, it might be also a good idea to backport
> including ganglia-rrd-modify.pl into the contrib which has been approved also
> and was dependent on that first backport.
> 
> but if you are going that route (and this is where this starts becoming a
> risky proposition) is that would be also nice to backport the original
> python 2.4 compatible version which doesn't have the problem the 2.3
> compatible version has and that would be a better fit for the majority of
> the users (except for the ones stuck with python 2.3 like CentOS 4 users
> and that have other problems getting ganglia running as well, like the lack
> of an APR1 official package they could use as a dependency), but then that
> version doesn't exist yet (even if it will be easy to come up with as you
> explained by rolling back the 2.3 compatibility patches) and hasn't been
> tested probably as much as the buggy one.
> 

Wow, I think I would rather just release it as is and fix all of this in the 
next version.  This issue really isn't that big of a deal. Especially since it 
is Friday and Bernard is ready to roll.

>> >> It still works reliably, it just has a wait timeout issue that is really
>> >> only noticeable when using the -m parameter.
>> > 
>> > but that would result in some metric samples failing silently and therefore
>> > in some wholes in the RRD values that could then result in mysterious drops
>> > in the graphs or flat lines.
>> 
>> No and the reason why is because the actual gathering of the data is
>> threaded.  tcpconn.py spins up its own gathering thread that periodically
>> exec's netstat and updates an internal array of metrics.
>> When the gmond main thread requests the metrics, all it does is read the
>> internal array and return whatever the last gathered value was.
> 
> Ok, but then that spawning netstat thread will randomly fail, an so
> depending on the frequency it fails compared with the polling gmond does
> you will get flat lines.
> 

No, There aren't flat lines.  A value is always being returned and I have never 
seen the netstat thread fail in normal use.  The only reason why a failure 
appears with the -m is because of the metric_clean() function was called and 
there was a race condition.  I have been running this code for months now and 
have never seen any kind of failure other than the -m parameter case.  
  
>> There is no delay to gmond at all.  At worst, the tcpconn gathering thread
>> might delay occasionally which has no effect on anything else.  It was
>> written this way on purpose so that gmond would never be at the mercy of
>> the python exec code, netstat delays in execution or OS delays.
> 
> Good to know, and definitely a sound architectural design.
> 
>> The delay only shows up for gmond when the tcpconn metric_clean() function
>> is called and the main gmond process has to wait for the tcpconn gathering
>> thread to shutdown.  That's why you see the delay in with the -m parameter
>> and no where else.
> 
> Well, as you explained you also see it at shutdown.
> 

right, which isn't a problem.  Any threaded application has to wait for it's 
threads to terminate on a clean shutdown.

>> The gmond -m option causes the metric_init(), which starts the gathering
>> thread and the metric_cleanup() which shuts down the gathering thread,
>> to happen one immediately after the other.  Gmond has to delay waiting
>> for the thread cleanup.
> 
> And this is IMHO a bug, but a fix for it is not something that will be ready
> to release anytime soon as spelled in the STATUS file.
> 
> It would be better if the metric_init() doesn't initialize the "spawning
> netstat thread" but leave that to the collection method that is scheduled
> by gmond and who would just need to do the first sample and initialize
> that thread the first time it is called.
> 
> This way the metric_cleanup() method won't need to wait either for the 
> `gmond
> -m` case which shouldn't execute any metric collection code in principle.
> 

right, I agree this is a bug and the solution you describe is exactly what 
should be done for the next version.  But this bug doesn't prevent tcpconn.py 
module from functioning normally and providing good metric data.  At the very 
worst, it is a -m annoyance and occasionally a shutdown delay.  IMO, the 
severity of the bug is not enough to pull it from the release.  Besides the 
fact that this is a python module.  If a user really required a fix before the 
next release, updating this module is no more than a file copy.  No rebuilding 
of anything is required.

Brad


-------------------------------------------------------------------------
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08
_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Reply via email to