Same with Merto's situation here, it always overflows short time after the restart. Without the hadoop metrics enabled everything is smooth. Regards
Mete On Tue, Feb 7, 2012 at 4:58 AM, Merto Mertek <[email protected]> wrote: > I have tried to run it but it repeats crashing.. > > - When you start gmetad and Hadoop is not emitting metrics, everything > > is peachy. > > > > Right, running just ganglia without running hadoop jobs seems stable for at > least a day.. > > > > - When you start Hadoop (and it thus starts emitting metrics), gmetad > > cores. > > > > True, with a following error : *** stack smashing detected ***: gmetad > terminated \n Segmentation fault > > - On my MacBookPro, it's a SIGABRT due to a buffer overflow. > > > > I believe this is happening for everyone. What I would like for you to > try > > out are the following 2 scenarios: > > > > - Once gmetad cores, if you start it up again, does it core again? Does > > this process repeat ad infinitum? > > > - On my MBP, the core is a one-time thing, and restarting gmetad > > after the first core makes things run perfectly smoothly. > > - I know others are saying this core occurs continuously, but > they > > were all using ganglia-3.1.x, and I'm interested in how > > ganglia-3.2.0 > > behaves for you. > > > > It cores everytime I run it. The difference is just that sometimes a > segmentation faults appears instantly, and sometimes it appears after a > random time...lets say after a minute of running gmetad and collecting > data. > > > > - If you start Hadoop first (so gmetad is not running when the > > first batch of Hadoop metrics are emitted) and THEN start gmetad after > a > > few seconds, do you still see gmetad coring? > > > > Yes > > > > - On my MBP, this sequence works perfectly fine, and there are no > > gmetad cores whatsoever. > > > > I have tested this scenario with 2 working nodes so two gmond plus the head > gmond on the server where gmetad is located. I have checked and all of them > are versioned 3.2.0. > > Hope it helps.. > > > > > > > Bear in mind that this only addresses the gmetad coring issue - the > > warnings emitted about '4.9E-324' being out of range will continue, but I > > know what's causing that as well (and hope that my patch fixes it for > > free). > > > > Varun > > On Mon, Feb 6, 2012 at 2:39 PM, Merto Mertek <[email protected]> > wrote: > > > > > Yes I am encoutering the same problems and like Mete said few seconds > > > after restarting a segmentation fault appears.. here is my conf.. > > > <http://pastebin.com/VgBjp08d> > > > > > > And here are some info from /var/log/messages (ubuntu server 10.10): > > > > > > kernel: [424447.140641] gmetad[26115] general protection > ip:7f7762428fdb > > > > sp:7f776362d370 error:0 in libgcc_s.so.1[7f776241a000+15000] > > > > > > > > > > When I compiled gmetad I used the following command: > > > > > > ./configure --with-gmetad --sysconfdir=/etc/ganglia > > > > CPPFLAGS="-I/usr/local/rrdtool-1.4.7/include" > > > > CFLAGS="-I/usr/local/rrdtool-1.4.7/include" > > > > LDFLAGS="-L/usr/local/rrdtool-1.4.7/lib" > > > > > > > > > > The same was tried with rrdtool 1.4.5. My current ganglia version is > > 3.2.0 > > > and like Mete I tried it with version 3.1.7 but without success.. > > > > > > Hope we will sort it out soon any solution.. > > > thank you > > > > > > > > > On 6 February 2012 20:09, mete <[email protected]> wrote: > > > > > > > Hello, > > > > i also face this issue when using GangliaContext31 and hadoop-1.0.0, > > and > > > > ganglia 3.1.7 (also tried 3.1.2). I continuously get buffer overflows > > as > > > > soon as i restart the gmetad. > > > > Regards > > > > Mete > > > > > > > > On Mon, Feb 6, 2012 at 7:42 PM, Vitthal "Suhas" Gogate < > > > > [email protected]> wrote: > > > > > > > > > I assume you have seen the following information on Hadoop twiki, > > > > > http://wiki.apache.org/hadoop/GangliaMetrics > > > > > > > > > > So do you use GangliaContext31 in hadoop-metrics2.properties? > > > > > > > > > > We use Ganglia 3.2 with Hadoop 20.205 and works fine (I remember > > > seeing > > > > > gmetad sometime goes down due to buffer overflow problem when > hadoop > > > > starts > > > > > pumping in the metrics.. but restarting works.. let me know if you > > face > > > > > same problem? > > > > > > > > > > --Suhas > > > > > > > > > > Additionally, the Ganglia protocol change significantly between > > Ganglia > > > > 3.0 > > > > > and Ganglia 3.1 (i.e., Ganglia 3.1 is not compatible with Ganglia > 3.0 > > > > > clients). This caused Hadoop to not work with Ganglia 3.1; there > is a > > > > patch > > > > > available for this, HADOOP-4675. As of November 2010, this patch > has > > > been > > > > > rolled into the mainline for 0.20.2 and later. To use the Ganglia > 3.1 > > > > > protocol in place of the 3.0, substitute > > > > > org.apache.hadoop.metrics.ganglia.GangliaContext31 for > > > > > org.apache.hadoop.metrics.ganglia.GangliaContext in the > > > > > hadoop-metrics.properties lines above. > > > > > > > > > > On Fri, Feb 3, 2012 at 1:07 PM, Merto Mertek <[email protected]> > > > > wrote: > > > > > > > > > > > I spent a lot of time to figure it out however i did not find a > > > > solution. > > > > > > Problems from the logs pointed me for some bugs in rrdupdate > tool, > > > > > however > > > > > > i tried to solve it with different versions of ganglia and > rrdtool > > > but > > > > > the > > > > > > error is the same. Segmentation fault appears after the following > > > > lines, > > > > > if > > > > > > I run gmetad in debug mode... > > > > > > > > > > > > "Created rrd > > > > > > > > > > > > > > > > > > > > > > > > > > > /var/lib/ganglia/rrds/hdcluster/xxx/metricssystem.MetricsSystem.publish_max_time.rrd" > > > > > > "Created rrd > > > > > > > > > > > > > > > > > > > > > > > > > > > /var/lib/ganglia/rrds/hdcluster/xxx/metricssystem.MetricsSystem.snapshot_max_time.rrd > > > > > > " > > > > > > > > > > > > which I suppose are generated from MetricsSystemImpl.java (Is > there > > > any > > > > > way > > > > > > just to disable this two metrics?) > > > > > > > > > > > > From the /var/log/messages there are a lot of errors: > > > > > > > > > > > > "xxx gmetad[15217]: RRD_update > > > > > > > > > > > > > > > > > > > > > > > > > > > (/var/lib/ganglia/rrds/hdc/xxx/metricssystem.MetricsSystem.publish_imax_time.rrd): > > > > > > converting '4.9E-324' to float: Numerical result out of range" > > > > > > "xxx gmetad[15217]: RRD_update > > > > > > > > > > > > > > > > > > > > > > > > > > > (/var/lib/ganglia/rrds/hdc/xxx/metricssystem.MetricsSystem.snapshot_imax_time.rrd): > > > > > > converting '4.9E-324' to float: Numerical result out of range" > > > > > > > > > > > > so probably there are some converting issues ? Where should I > look > > > for > > > > > the > > > > > > solution? Would you rather suggest to use ganglia 3.0.x with the > > old > > > > > > protocol and leave the version >3.1 for further releases? > > > > > > > > > > > > any help is realy appreciated... > > > > > > > > > > > > On 1 February 2012 04:04, Merto Mertek <[email protected]> > > wrote: > > > > > > > > > > > > > I would be glad to hear that too.. I've setup the following: > > > > > > > > > > > > > > Hadoop 0.20.205 > > > > > > > Ganglia Front 3.1.7 > > > > > > > Ganglia Back *(gmetad)* 3.1.7 > > > > > > > RRDTool <http://www.rrdtool.org/> 1.4.5. -> i had some > troubles > > > > > > > installing 1.4.4 > > > > > > > > > > > > > > Ganglia works just in case hadoop is not running, so metrics > are > > > not > > > > > > > publshed to gmetad node (conf with new > > > hadoop-metrics2.proprieties). > > > > > When > > > > > > > hadoop is started, a segmentation fault appears in gmetad > deamon: > > > > > > > > > > > > > > sudo gmetad -d 2 > > > > > > > ....... > > > > > > > Updating host xxx, metric dfs.FSNamesystem.BlocksTotal > > > > > > > Updating host xxx, metric bytes_in > > > > > > > Updating host xxx, metric bytes_out > > > > > > > Updating host xxx, metric > > > > metricssystem.MetricsSystem.publish_max_time > > > > > > > Created rrd > > > > > > > > > > > > > > > > > > > > > > > > > > > > /var/lib/ganglia/rrds/hdcluster/hadoopmaster/metricssystem.MetricsSystem.publish_max_time.rrd > > > > > > > Segmentation fault > > > > > > > > > > > > > > And some info from the apache log < > http://pastebin.com/nrqKRtKJ > > >.. > > > > > > > > > > > > > > Can someone suggest a ganglia version that is tested with > hadoop > > > > > > 0.20.205? > > > > > > > I will try to sort it out however it seems a not so tribial > > > problem.. > > > > > > > > > > > > > > Thank you > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 2 December 2011 12:32, praveenesh kumar < > [email protected] > > > > > > > > wrote: > > > > > > > > > > > > > >> or Do I have to apply some hadoop patch for this ? > > > > > > >> > > > > > > >> Thanks, > > > > > > >> Praveenesh > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
