matt, asaph. In testing 2.2.3 it appears that matt's fix did do the trick. If there is still a potential for prematurely free()ing the barrier I can't get it to do it...
great job matt! Mike Asaph Zemach ([EMAIL PROTECTED]) said: > I think the problem still exists even with this fix. > You don't know the order by which threads leave the barrier, > so you might still be calling barrier_destroy() while there > are threads accessing b. > > In general this kind of scheme: > > thread1: > b = allocate_barrier(); > spawn_threads(b); > wait_barrier(b); > free(b); > > > threadnN: > wait_barrier(b); > > can't work because you have threads 1..N all accessing the > data structure pointed to by b simultaneously, and you have > no control over which one will exit wait_barrier() first. > If it happens to be thread1, then it will free() b while > other threads are still reading the data pointed to by b. > > If you REALLY want to solve this, I think you'd need two > barriers: > > > thread1: > b2 = static_barrier; > b1 = allocate_barrier(); > spawn_threads(b1,b2); > wait_barrier(b1); > wait_barrier(b2); > free(b1); > // b2 is never freed > > > threadnN: > wait_barrier(b1); > wait_barrier(b2); > > Of course, this is only interesting if you can't make do with just > having only static barriers. If you are in a situation that you > absolutely must allocate and free the memory held by the barriers > I don't know of another safe way to do this. > > > On Mon, Apr 08, 2002 at 03:48:41PM -0700, matt massie wrote: > > mike- > > > > you can blame me for the problem you were having. i didn't code the > > barriers correctly in gmond. the machines i tested gmond on before i > > released it didn't display the problem so i released it with this bug... > > > > if you look at line 108 of gmond you'll see i initialize a barrier and > > then pass it to the mcast_threads that i spin off. directly afterwards i > > run a barrier_destroy(). bad. > > > > if the main gmond runs the barrier_destroy() BEFORE all the mcast_threads > > can run a barrier_barrier() then you will have a problem. the mcast > > threads will be operating on freed memory... otherwise.. everthing is > > peachy. > > > > the fix was just to increase the barrier count by one and place a > > barrier_barrier() just before the barrier_destroy() to force the main > > thread to wait until all the mcast threads are started. > > > > thanks so much for the feedback. > > > > also, i added the --no_setuid and --setuid flags in order to give you more > > debugging power. i know you were having trouble creating a core file > > because gmond sets the uid to the uid of "nobody". you can prevent gmond > > from starting up as nobody with the "--no_setuid" flag. > > > > good luck! and please let me know if i didn't solve your problem! > > -matt > > > > Saturday, Mike Snitzer wrote forth saying... > > > > > gmond segfaults 50% of the time at startup. The random nature of it > > > suggests to me that their is a race condition when the gmond threads > > > startup. When I tried to strace or run gmond through gdb the problem > > > wasn't apparant.. which is what led me to believe it's a threading problem > > > that strace or gdb masks. > > > > > > Any recommendations for accurately debugging gmond would be great; cause > > > when running through strace and gdb I can't get it to segfault. > > > > > > FYI, I'm running gmond v2.2.2 on 48 nodes of those 16 of the nodes' gmond > > > segfaulted at startup... > > > > > > Mike > > > > > > ps. > > > here's an example: > > > `which gmond` --debug_level=1 -i eth0 > > > > > > mcast_listen_thread() received metric data cpu_speed > > > mcast_value() mcasting cpu_user value > > > 2051 pre_process_node() remote_ip=192.168.0.28encoded 8 XDR > > > bytespre_process_node() has saved the hostname > > > pre_process_node() has set the timestamp > > > pre_process_node() received a new node > > > > > > > > > XDR data successfully sent > > > set_metric_value() got metric key 11 > > > set_metric_value() exec'd cpu_nice_func (11) > > > Segmentation fault > > > > > > > > > _______________________________________________ > > > Ganglia-general mailing list > > > [email protected] > > > https://lists.sourceforge.net/lists/listinfo/ganglia-general > > > > > > > > > _______________________________________________ > > Ganglia-general mailing list > > [email protected] > > https://lists.sourceforge.net/lists/listinfo/ganglia-general > > > > Sponsored by http://www.ThinkGeek.com/ > > _______________________________________________ > Ganglia-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/ganglia-general > > Sponsored by http://www.ThinkGeek.com/ >

