>>> On 2/2/2010 at 6:23 AM, in message <4b682769.6000...@pocock.com.au>, Daniel Pocock <dan...@pocock.com.au> wrote:
> > I've just been testing r2258 on CentOS 5. rpmbuild runs successfully > and the packages install and run. > > However, I notice that some of the tcpconn metrics are failing. > tcpconn.py doesn't appear to have changed since r1658 (August 2008). It > is the only python module that is loaded by default. > > The commit mentions moving the netstat thread start - are you able to > have a look at this Brad? > > You can get my tarball from http://www.pocock.com.au/ganglia/test if you > need to. It is bootstrapped on Debian 5. > > > metric 'tcp_established' being collected now > metric 'tcp_established' has value_threshold 1.000000 > metric 'tcp_listen' being collected now > [PYTHON] Can't call the metric handler function for [tcp_listen] in the > python module [tcpconn]. > > Traceback (most recent call last): > File "/usr/lib/ganglia/python_modules/tcpconn.py", line 67, in > TCP_Connections > _WorkerThread.start() > File "/usr/lib/python2.4/threading.py", line 410, in start > assert not self.__started, "thread already started" > AssertionError: thread already started > metric 'tcp_listen' has value_threshold 1.000000 > metric 'tcp_timewait' being collected now > [PYTHON] Can't call the metric handler function for [tcp_timewait] in > the python module [tcpconn]. > I can't reproduce the problem so all I can do is take a guess at what might be happening and leave it to somebody who is seeing the issue to verify what is happening. The exception that you are seeing is a result of a thread trying to be started multiple times. There is an if statement in TCP_connections() that is suppose to prevent this from happening. This if statement checks two thread variables that should indicate what state the thread is in. The running thread variable is set to false during thread initialization and is set to true as soon as the threads run method is called. The run method is of the thread is called as a result of calling the start() method on the thread object. Each time that one of the tcpconn metrcs is gathered, the metric callback hits the thread start if statement. If the run thread variable is set to true, then no other metric invocation should be allowed to start the thread again. There is a very small window where, on initial startup, two metric callbacks could get past the if statement in TCP_connections() and try to start the thread a second time. The windows would be caused by a delay between the time that the start() method is called and when the threading module finally calls the threads run() method. We could add a try...catch block around the start() call to catch and ignore the exception if the thread is started a second time. But the part that bothers me is that in the list of exceptions, the thread was obviously attempted more than just a second time. So my questions are, is the thread really running when the second or more attempts are made? Is the thread bailing out somewhere before the "running" thread variable is set? If we added the try...catch block and ignored the thread, does this leave the thread running and in a functional state? Without being able to reproduce the problem, I can't really answer these questions. Brad ------------------------------------------------------------------------------ The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com _______________________________________________ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers