Varun sorry for my late response. Today I have deployed a new version and I can confirm that patches you provided works well. I' ve been running some jobs on a 5node cluster for an hour without a core on full load so now thinks works as expected.
Thank you again! I have used just your first option.. On 15 February 2012 19:53, mete <efk...@gmail.com> wrote: > Well rebuilding ganglia seemed easier and Merto was testing the other so i > though that i should give that one a chance :) > anyway i will send you gdb details or patch hadoop and try it at my > earliest convenience > > Cheers > > On Wed, Feb 15, 2012 at 6:59 PM, Varun Kapoor <rez...@hortonworks.com > >wrote: > > > The warnings about underflow are totally expected (they come from > strtod(), > > and they will no longer occur with Hadoop-1.0.1, which applies my patch > > from HADOOP-8052), so that's not worrisome. > > > > As for the buffer overflow, do you think you could show me a backtrace of > > this core? If you can't find the core file on disk, just start gmetad > under > > gdb, like so: > > > > $ sudo gdb <path to gmetad> > > > > (gdb) r --conf=<path to your gmetad.conf> > > ... > > ::Wait for crash:: > > (gdb) bt > > (gdb) info locals > > > > If you're familiar with gdb, then I'd appreciate any additional diagnosis > > you could perform (for example, to figure out which metric's value caused > > this buffer overflow) - if you're not, I'll try and send you some gdb > > scripts to narrow things down once I see the output from this round of > > debugging. > > > > Also, out of curiosity, is patching Hadoop not an option for you? Or is > it > > just that rebuilding (and redeploying) ganglia is the lesser of the 2 > > evils? :) > > > > Varun > > > > On Tue, Feb 14, 2012 at 11:43 PM, mete <efk...@gmail.com> wrote: > > > > > Hello Varun, > > > i have patched and recompiled ganglia from source bit it still cores > > after > > > the patch. > > > > > > Here are some logs: > > > Feb 15 09:39:14 master gmetad[16487]: RRD_update > > > > > > > > > (/var/lib/ganglia/rrds/hadoop/slave4/metricssystem.MetricsSystem.publish_max_time.rrd): > > > > > > > > > /var/lib/ganglia/rrds/hadoop/slave4/metricssystem.MetricsSystem.publish_max_time.rrd: > > > converting '4.9E-324' to float: Numerical result out of range > > > Feb 15 09:39:14 master gmetad[16487]: RRD_update > > > > > > > > > (/var/lib/ganglia/rrds/hadoop/master/metricssystem.MetricsSystem.publish_imax_time.rrd): > > > > > > > > > /var/lib/ganglia/rrds/hadoop/master/metricssystem.MetricsSystem.publish_imax_time.rrd: > > > converting '4.9E-324' to float: Numerical result out of range > > > Feb 15 09:39:14 master gmetad[16487]: RRD_update > > > > > > > > > (/var/lib/ganglia/rrds/hadoop/slave1/metricssystem.MetricsSystem.publish_imax_time.rrd): > > > > > > > > > /var/lib/ganglia/rrds/hadoop/slave1/metricssystem.MetricsSystem.publish_imax_time.rrd: > > > converting '4.9E-324' to float: Numerical result out of range > > > Feb 15 09:39:14 master gmetad[16487]: RRD_update > > > > > > > > > (/var/lib/ganglia/rrds/hadoop/slave1/metricssystem.MetricsSystem.snapshot_imax_time.rrd): > > > > > > > > > /var/lib/ganglia/rrds/hadoop/slave1/metricssystem.MetricsSystem.snapshot_imax_time.rrd: > > > converting '4.9E-324' to float: Numerical result out of range > > > Feb 15 09:39:14 master gmetad[16487]: RRD_update > > > > > > > > > (/var/lib/ganglia/rrds/hadoop/slave1/metricssystem.MetricsSystem.publish_max_time.rrd): > > > > > > > > > /var/lib/ganglia/rrds/hadoop/slave1/metricssystem.MetricsSystem.publish_max_time.rrd: > > > converting '4.9E-324' to float: Numerical result out of range > > > Feb 15 09:39:14 master gmetad[16487]: *** buffer overflow detected ***: > > > gmetad terminated > > > > > > i am using hadoop.1.0.0 and ganglia 3.20 tarball. > > > > > > Cheers > > > Mete > > > > > > On Sat, Feb 11, 2012 at 2:19 AM, Merto Mertek <masmer...@gmail.com> > > wrote: > > > > > > > Varun unfortunately I have had some problems with deploying a new > > version > > > > on the cluster.. Hadoop is not picking the new build in lib folder > > > despite > > > > a classpath is set to it. The new build is picked just if I put it in > > the > > > > $HD_HOME/share/hadoop/, which is very strange.. I've done this on all > > > nodes > > > > and can access the web, but all tasktracker are being stopped because > > of > > > an > > > > error: > > > > > > > > INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: > > > Cleanup... > > > > > java.lang.InterruptedException: sleep interrupted > > > > > at java.lang.Thread.sleep(Native Method) > > > > > at > > > > > > > > > > > > > > > org.apache.hadoop.filecache.TrackerDistributedCacheManager$CleanupThread.run(TrackerDistributedCacheManager.java:926) > > > > > > > > > > > > > > > > > Probably the error is the consequence of an inadequate deploy of a > > jar.. > > > I > > > > will ask to the dev list how they do it or are you maybe having any > > other > > > > idea? > > > > > > > > > > > > > > > > On 10 February 2012 17:10, Varun Kapoor <rez...@hortonworks.com> > > wrote: > > > > > > > > > Hey Merto, > > > > > > > > > > Any luck getting the patch running on your cluster? > > > > > > > > > > In case you're interested, there's now a JIRA for this: > > > > > https://issues.apache.org/jira/browse/HADOOP-8052. > > > > > > > > > > Varun > > > > > > > > > > On Wed, Feb 8, 2012 at 7:45 PM, Varun Kapoor < > rez...@hortonworks.com > > > > > > > > wrote: > > > > > > > > > > > Your general procedure sounds correct (i.e. dropping your newly > > built > > > > > .jar > > > > > > into $HD_HOME/lib/), but to make sure it's getting picked up, you > > > > should > > > > > > explicitly add $HD_HOME/lib/ to your exported HADOOP_CLASSPATH > > > > > environment > > > > > > variable; here's mine, as an example: > > > > > > > > > > > > export HADOOP_CLASSPATH=".:./build/*.jar" > > > > > > > > > > > > About your second point, you certainly need to copy this newly > > > patched > > > > > > .jar to every node in your cluster, because my patch changes the > > > value > > > > > of a > > > > > > couple metrics emitted TO gmetad (FROM all the nodes in the > > cluster), > > > > so > > > > > > without copying it over to every node in the cluster, gmetad will > > > still > > > > > > likely receive some bad metrics. > > > > > > > > > > > > Varun > > > > > > > > > > > > > > > > > > On Wed, Feb 8, 2012 at 6:19 PM, Merto Mertek < > masmer...@gmail.com> > > > > > wrote: > > > > > > > > > > > >> I will need your help. Please confirm if the following procedure > > is > > > > > right. > > > > > >> I have a dev environment where I pimp my scheduler (no hadoop > > > running) > > > > > and > > > > > >> a small cluster environment where the changes(jars) are deployed > > > with > > > > > some > > > > > >> scripts, however I have never compiled the whole hadoop from > > source > > > > so > > > > > I > > > > > >> do not know if I am doing it right. I' ve done it as follow: > > > > > >> > > > > > >> a) apply a patch > > > > > >> b) cd $HD_HOME; ant > > > > > >> c) copy $HD_HOME/*build*/patched-core-hadoop.jar -> > > > > > >> cluster:/$HD_HOME/*lib* > > > > > >> d) run $HD_HOME/bin/start-all.sh > > > > > >> > > > > > >> Is this enough? When I tried to test "hadoop dfs -ls /" I could > > see > > > > > that a > > > > > >> new jar was not loaded and instead a jar from > > > > > >> $HD_HOME/*share*/hadoop-20.205.0.jar > > > > > >> was taken.. > > > > > >> Should I copy the entire hadoop folder to all nodes and > > reconfigure > > > > the > > > > > >> entire cluster for the new build, or is enough if I configure it > > > just > > > > on > > > > > >> the node where gmetad will run? > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> On 8 February 2012 06:33, Varun Kapoor <rez...@hortonworks.com> > > > > wrote: > > > > > >> > > > > > >> > I'm so sorry, Merto - like a silly goose, I attached the 2 > > patches > > > > to > > > > > my > > > > > >> > reply, and of course the mailing list did not accept the > > > attachment. > > > > > >> > > > > > > >> > I plan on opening JIRAs for this tomorrow, but till then, here > > are > > > > > >> links to > > > > > >> > the 2 patches (from my Dropbox account): > > > > > >> > > > > > > >> > - > > > > http://dl.dropbox.com/u/4366344/gmetadBufferOverflow.Hadoop.patch > > > > > >> > - > > > > http://dl.dropbox.com/u/4366344/gmetadBufferOverflow.gmetad.patch > > > > > >> > > > > > > >> > Here's hoping this works for you, > > > > > >> > > > > > > >> > Varun > > > > > >> > On Tue, Feb 7, 2012 at 6:00 PM, Merto Mertek < > > masmer...@gmail.com > > > > > > > > > >> wrote: > > > > > >> > > > > > > >> > > Varun, have I missed your link to the patches? I have tried > to > > > > > search > > > > > >> > them > > > > > >> > > on jira but I did not find them.. Can you repost the link > for > > > > these > > > > > >> two > > > > > >> > > patches? > > > > > >> > > > > > > > >> > > Thank you.. > > > > > >> > > > > > > > >> > > On 7 February 2012 20:36, Varun Kapoor < > > rez...@hortonworks.com> > > > > > >> wrote: > > > > > >> > > > > > > > >> > > > I'm sorry to hear that gmetad cores continuously for you > > guys. > > > > > Since > > > > > >> > I'm > > > > > >> > > > not seeing that behavior, I'm going to just put out the 2 > > > > possible > > > > > >> > > patches > > > > > >> > > > you could apply and wait to hear back from you. :) > > > > > >> > > > > > > > > >> > > > Option 1 > > > > > >> > > > > > > > > >> > > > * Apply gmetadBufferOverflow.Hadoop.patch to the relevant > > > file ( > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/core/org/apache/hadoop/metrics2/util/SampleStat.java?view=markupinmysetup > > > > > ) > > > > > >> in your Hadoop sources and rebuild Hadoop. > > > > > >> > > > > > > > > >> > > > Option 2 > > > > > >> > > > > > > > > >> > > > * Apply gmetadBufferOverflow.gmetad.patch to > > > > gmetad/process_xml.c > > > > > >> and > > > > > >> > > > rebuild gmetad. > > > > > >> > > > > > > > > >> > > > Only 1 of these 2 fixes is required, and it would help me > if > > > you > > > > > >> could > > > > > >> > > > first try Option 1 and let me know if that fixes things > for > > > you. > > > > > >> > > > > > > > > >> > > > Varun > > > > > >> > > > > > > > > >> > > > On Mon, Feb 6, 2012 at 10:36 PM, mete <efk...@gmail.com> > > > wrote: > > > > > >> > > > > > > > > >> > > >> Same with Merto's situation here, it always overflows > short > > > > time > > > > > >> after > > > > > >> > > the > > > > > >> > > >> restart. Without the hadoop metrics enabled everything is > > > > smooth. > > > > > >> > > >> Regards > > > > > >> > > >> > > > > > >> > > >> Mete > > > > > >> > > >> > > > > > >> > > >> On Tue, Feb 7, 2012 at 4:58 AM, Merto Mertek < > > > > > masmer...@gmail.com> > > > > > >> > > wrote: > > > > > >> > > >> > > > > > >> > > >> > I have tried to run it but it repeats crashing.. > > > > > >> > > >> > > > > > > >> > > >> > - When you start gmetad and Hadoop is not emitting > > > metrics, > > > > > >> > > everything > > > > > >> > > >> > > is peachy. > > > > > >> > > >> > > > > > > > >> > > >> > > > > > > >> > > >> > Right, running just ganglia without running hadoop jobs > > > seems > > > > > >> stable > > > > > >> > > >> for at > > > > > >> > > >> > least a day.. > > > > > >> > > >> > > > > > > >> > > >> > > > > > > >> > > >> > > - When you start Hadoop (and it thus starts > emitting > > > > > >> metrics), > > > > > >> > > >> gmetad > > > > > >> > > >> > > cores. > > > > > >> > > >> > > > > > > > >> > > >> > > > > > > >> > > >> > True, with a following error : *** stack smashing > > detected > > > > > ***: > > > > > >> > > gmetad > > > > > >> > > >> > terminated \n Segmentation fault > > > > > >> > > >> > > > > > > >> > > >> > - On my MacBookPro, it's a SIGABRT due to a buffer > > > > > overflow. > > > > > >> > > >> > > > > > > > >> > > >> > > I believe this is happening for everyone. What I > would > > > like > > > > > for > > > > > >> > you > > > > > >> > > to > > > > > >> > > >> > try > > > > > >> > > >> > > out are the following 2 scenarios: > > > > > >> > > >> > > > > > > > >> > > >> > > - Once gmetad cores, if you start it up again, does > > it > > > > core > > > > > >> > again? > > > > > >> > > >> Does > > > > > >> > > >> > > this process repeat ad infinitum? > > > > > >> > > >> > > > > > > > >> > > >> > - On my MBP, the core is a one-time thing, and > > > restarting > > > > > >> gmetad > > > > > >> > > >> > > after the first core makes things run perfectly > > > > > smoothly. > > > > > >> > > >> > > - I know others are saying this core occurs > > > > > >> continuously, > > > > > >> > > but > > > > > >> > > >> > they > > > > > >> > > >> > > were all using ganglia-3.1.x, and I'm > > interested > > > in > > > > > how > > > > > >> > > >> > > ganglia-3.2.0 > > > > > >> > > >> > > behaves for you. > > > > > >> > > >> > > > > > > > >> > > >> > > > > > > >> > > >> > It cores everytime I run it. The difference is just > that > > > > > >> sometimes a > > > > > >> > > >> > segmentation faults appears instantly, and sometimes it > > > > appears > > > > > >> > after > > > > > >> > > a > > > > > >> > > >> > random time...lets say after a minute of running gmetad > > and > > > > > >> > collecting > > > > > >> > > >> > data. > > > > > >> > > >> > > > > > > >> > > >> > > > > > > >> > > >> > > - If you start Hadoop first (so gmetad is not > > > > running > > > > > >> when > > > > > >> > > the > > > > > >> > > >> > > first batch of Hadoop metrics are emitted) and THEN > > > start > > > > > >> gmetad > > > > > >> > > >> after > > > > > >> > > >> > a > > > > > >> > > >> > > few seconds, do you still see gmetad coring? > > > > > >> > > >> > > > > > > > >> > > >> > > > > > > >> > > >> > Yes > > > > > >> > > >> > > > > > > >> > > >> > > > > > > >> > > >> > > - On my MBP, this sequence works perfectly fine, > > and > > > > > there > > > > > >> > are > > > > > >> > > no > > > > > >> > > >> > > gmetad cores whatsoever. > > > > > >> > > >> > > > > > > > >> > > >> > > > > > > >> > > >> > I have tested this scenario with 2 working nodes so two > > > gmond > > > > > >> plus > > > > > >> > the > > > > > >> > > >> head > > > > > >> > > >> > gmond on the server where gmetad is located. I have > > checked > > > > and > > > > > >> all > > > > > >> > of > > > > > >> > > >> them > > > > > >> > > >> > are versioned 3.2.0. > > > > > >> > > >> > > > > > > >> > > >> > Hope it helps.. > > > > > >> > > >> > > > > > > >> > > >> > > > > > > >> > > >> > > > > > > >> > > >> > > > > > > > >> > > >> > > Bear in mind that this only addresses the gmetad > coring > > > > > issue - > > > > > >> > the > > > > > >> > > >> > > warnings emitted about '4.9E-324' being out of range > > will > > > > > >> > continue, > > > > > >> > > >> but I > > > > > >> > > >> > > know what's causing that as well (and hope that my > > patch > > > > > fixes > > > > > >> it > > > > > >> > > for > > > > > >> > > >> > > free). > > > > > >> > > >> > > > > > > > >> > > >> > > Varun > > > > > >> > > >> > > On Mon, Feb 6, 2012 at 2:39 PM, Merto Mertek < > > > > > >> masmer...@gmail.com > > > > > >> > > > > > > > >> > > >> > wrote: > > > > > >> > > >> > > > > > > > >> > > >> > > > Yes I am encoutering the same problems and like > Mete > > > said > > > > > >> few > > > > > >> > > >> seconds > > > > > >> > > >> > > > after restarting a segmentation fault appears.. > here > > is > > > > my > > > > > >> > conf.. > > > > > >> > > >> > > > <http://pastebin.com/VgBjp08d> > > > > > >> > > >> > > > > > > > > >> > > >> > > > And here are some info from /var/log/messages > (ubuntu > > > > > server > > > > > >> > > 10.10): > > > > > >> > > >> > > > > > > > > >> > > >> > > > kernel: [424447.140641] gmetad[26115] general > > > protection > > > > > >> > > >> > ip:7f7762428fdb > > > > > >> > > >> > > > > sp:7f776362d370 error:0 in > > > > > >> libgcc_s.so.1[7f776241a000+15000] > > > > > >> > > >> > > > > > > > > > >> > > >> > > > > > > > > >> > > >> > > > When I compiled gmetad I used the following > command: > > > > > >> > > >> > > > > > > > > >> > > >> > > > ./configure --with-gmetad --sysconfdir=/etc/ganglia > > > > > >> > > >> > > > > CPPFLAGS="-I/usr/local/rrdtool-1.4.7/include" > > > > > >> > > >> > > > > CFLAGS="-I/usr/local/rrdtool-1.4.7/include" > > > > > >> > > >> > > > > LDFLAGS="-L/usr/local/rrdtool-1.4.7/lib" > > > > > >> > > >> > > > > > > > > > >> > > >> > > > > > > > > >> > > >> > > > The same was tried with rrdtool 1.4.5. My current > > > ganglia > > > > > >> > version > > > > > >> > > is > > > > > >> > > >> > > 3.2.0 > > > > > >> > > >> > > > and like Mete I tried it with version 3.1.7 but > > without > > > > > >> > success.. > > > > > >> > > >> > > > > > > > > >> > > >> > > > Hope we will sort it out soon any solution.. > > > > > >> > > >> > > > thank you > > > > > >> > > >> > > > > > > > > >> > > >> > > > > > > > > >> > > >> > > > On 6 February 2012 20:09, mete <efk...@gmail.com> > > > wrote: > > > > > >> > > >> > > > > > > > > >> > > >> > > > > Hello, > > > > > >> > > >> > > > > i also face this issue when using > GangliaContext31 > > > and > > > > > >> > > >> hadoop-1.0.0, > > > > > >> > > >> > > and > > > > > >> > > >> > > > > ganglia 3.1.7 (also tried 3.1.2). I continuously > > get > > > > > buffer > > > > > >> > > >> overflows > > > > > >> > > >> > > as > > > > > >> > > >> > > > > soon as i restart the gmetad. > > > > > >> > > >> > > > > Regards > > > > > >> > > >> > > > > Mete > > > > > >> > > >> > > > > > > > > > >> > > >> > > > > On Mon, Feb 6, 2012 at 7:42 PM, Vitthal "Suhas" > > > Gogate > > > > < > > > > > >> > > >> > > > > gog...@hortonworks.com> wrote: > > > > > >> > > >> > > > > > > > > > >> > > >> > > > > > I assume you have seen the following > information > > on > > > > > >> Hadoop > > > > > >> > > >> twiki, > > > > > >> > > >> > > > > > http://wiki.apache.org/hadoop/GangliaMetrics > > > > > >> > > >> > > > > > > > > > > >> > > >> > > > > > So do you use GangliaContext31 in > > > > > >> > hadoop-metrics2.properties? > > > > > >> > > >> > > > > > > > > > > >> > > >> > > > > > We use Ganglia 3.2 with Hadoop 20.205 and > works > > > fine > > > > > (I > > > > > >> > > >> remember > > > > > >> > > >> > > > seeing > > > > > >> > > >> > > > > > gmetad sometime goes down due to buffer > overflow > > > > > problem > > > > > >> > when > > > > > >> > > >> > hadoop > > > > > >> > > >> > > > > starts > > > > > >> > > >> > > > > > pumping in the metrics.. but restarting works.. > > let > > > > me > > > > > >> know > > > > > >> > if > > > > > >> > > >> you > > > > > >> > > >> > > face > > > > > >> > > >> > > > > > same problem? > > > > > >> > > >> > > > > > > > > > > >> > > >> > > > > > --Suhas > > > > > >> > > >> > > > > > > > > > > >> > > >> > > > > > Additionally, the Ganglia protocol change > > > > significantly > > > > > >> > > between > > > > > >> > > >> > > Ganglia > > > > > >> > > >> > > > > 3.0 > > > > > >> > > >> > > > > > and Ganglia 3.1 (i.e., Ganglia 3.1 is not > > > compatible > > > > > with > > > > > >> > > >> Ganglia > > > > > >> > > >> > 3.0 > > > > > >> > > >> > > > > > clients). This caused Hadoop to not work with > > > Ganglia > > > > > >> 3.1; > > > > > >> > > there > > > > > >> > > >> > is a > > > > > >> > > >> > > > > patch > > > > > >> > > >> > > > > > available for this, HADOOP-4675. As of November > > > 2010, > > > > > >> this > > > > > >> > > patch > > > > > >> > > >> > has > > > > > >> > > >> > > > been > > > > > >> > > >> > > > > > rolled into the mainline for 0.20.2 and later. > To > > > use > > > > > the > > > > > >> > > >> Ganglia > > > > > >> > > >> > 3.1 > > > > > >> > > >> > > > > > protocol in place of the 3.0, substitute > > > > > >> > > >> > > > > > > > org.apache.hadoop.metrics.ganglia.GangliaContext31 > > > > for > > > > > >> > > >> > > > > > > org.apache.hadoop.metrics.ganglia.GangliaContext > > in > > > > the > > > > > >> > > >> > > > > > hadoop-metrics.properties lines above. > > > > > >> > > >> > > > > > > > > > > >> > > >> > > > > > On Fri, Feb 3, 2012 at 1:07 PM, Merto Mertek < > > > > > >> > > >> masmer...@gmail.com> > > > > > >> > > >> > > > > wrote: > > > > > >> > > >> > > > > > > > > > > >> > > >> > > > > > > I spent a lot of time to figure it out > however > > i > > > > did > > > > > >> not > > > > > >> > > find > > > > > >> > > >> a > > > > > >> > > >> > > > > solution. > > > > > >> > > >> > > > > > > Problems from the logs pointed me for some > bugs > > > in > > > > > >> > rrdupdate > > > > > >> > > >> > tool, > > > > > >> > > >> > > > > > however > > > > > >> > > >> > > > > > > i tried to solve it with different versions > of > > > > > ganglia > > > > > >> and > > > > > >> > > >> > rrdtool > > > > > >> > > >> > > > but > > > > > >> > > >> > > > > > the > > > > > >> > > >> > > > > > > error is the same. Segmentation fault appears > > > after > > > > > the > > > > > >> > > >> following > > > > > >> > > >> > > > > lines, > > > > > >> > > >> > > > > > if > > > > > >> > > >> > > > > > > I run gmetad in debug mode... > > > > > >> > > >> > > > > > > > > > > > >> > > >> > > > > > > "Created rrd > > > > > >> > > >> > > > > > > > > > > > >> > > >> > > > > > > > > > > > >> > > >> > > > > > > > > > > >> > > >> > > > > > > > > > >> > > >> > > > > > > > > >> > > >> > > > > > > > >> > > >> > > > > > > >> > > >> > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > /var/lib/ganglia/rrds/hdcluster/xxx/metricssystem.MetricsSystem.publish_max_time.rrd" > > > > > >> > > >> > > > > > > "Created rrd > > > > > >> > > >> > > > > > > > > > > > >> > > >> > > > > > > > > > > > >> > > >> > > > > > > > > > > >> > > >> > > > > > > > > > >> > > >> > > > > > > > > >> > > >> > > > > > > > >> > > >> > > > > > > >> > > >> > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > /var/lib/ganglia/rrds/hdcluster/xxx/metricssystem.MetricsSystem.snapshot_max_time.rrd > > > > > >> > > >> > > > > > > " > > > > > >> > > >> > > > > > > > > > > > >> > > >> > > > > > > which I suppose are generated from > > > > > >> MetricsSystemImpl.java > > > > > >> > > (Is > > > > > >> > > >> > there > > > > > >> > > >> > > > any > > > > > >> > > >> > > > > > way > > > > > >> > > >> > > > > > > just to disable this two metrics?) > > > > > >> > > >> > > > > > > > > > > > >> > > >> > > > > > > From the /var/log/messages there are a lot of > > > > errors: > > > > > >> > > >> > > > > > > > > > > > >> > > >> > > > > > > "xxx gmetad[15217]: RRD_update > > > > > >> > > >> > > > > > > > > > > > >> > > >> > > > > > > > > > > > >> > > >> > > > > > > > > > > >> > > >> > > > > > > > > > >> > > >> > > > > > > > > >> > > >> > > > > > > > >> > > >> > > > > > > >> > > >> > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > (/var/lib/ganglia/rrds/hdc/xxx/metricssystem.MetricsSystem.publish_imax_time.rrd): > > > > > >> > > >> > > > > > > converting '4.9E-324' to float: Numerical > > result > > > > out > > > > > >> of > > > > > >> > > >> range" > > > > > >> > > >> > > > > > > "xxx gmetad[15217]: RRD_update > > > > > >> > > >> > > > > > > > > > > > >> > > >> > > > > > > > > > > > >> > > >> > > > > > > > > > > >> > > >> > > > > > > > > > >> > > >> > > > > > > > > >> > > >> > > > > > > > >> > > >> > > > > > > >> > > >> > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > (/var/lib/ganglia/rrds/hdc/xxx/metricssystem.MetricsSystem.snapshot_imax_time.rrd): > > > > > >> > > >> > > > > > > converting '4.9E-324' to float: Numerical > > result > > > > out > > > > > >> of > > > > > >> > > >> range" > > > > > >> > > >> > > > > > > > > > > > >> > > >> > > > > > > so probably there are some converting issues > ? > > > > Where > > > > > >> > should > > > > > >> > > I > > > > > >> > > >> > look > > > > > >> > > >> > > > for > > > > > >> > > >> > > > > > the > > > > > >> > > >> > > > > > > solution? Would you rather suggest to use > > ganglia > > > > > 3.0.x > > > > > >> > with > > > > > >> > > >> the > > > > > >> > > >> > > old > > > > > >> > > >> > > > > > > protocol and leave the version >3.1 for > further > > > > > >> releases? > > > > > >> > > >> > > > > > > > > > > > >> > > >> > > > > > > any help is realy appreciated... > > > > > >> > > >> > > > > > > > > > > > >> > > >> > > > > > > On 1 February 2012 04:04, Merto Mertek < > > > > > >> > masmer...@gmail.com > > > > > >> > > > > > > > > >> > > >> > > wrote: > > > > > >> > > >> > > > > > > > > > > > >> > > >> > > > > > > > I would be glad to hear that too.. I've > setup > > > the > > > > > >> > > following: > > > > > >> > > >> > > > > > > > > > > > > >> > > >> > > > > > > > Hadoop 0.20.205 > > > > > >> > > >> > > > > > > > Ganglia Front 3.1.7 > > > > > >> > > >> > > > > > > > Ganglia Back *(gmetad)* 3.1.7 > > > > > >> > > >> > > > > > > > RRDTool <http://www.rrdtool.org/> 1.4.5. > -> > > i > > > > had > > > > > >> some > > > > > >> > > >> > troubles > > > > > >> > > >> > > > > > > > installing 1.4.4 > > > > > >> > > >> > > > > > > > > > > > > >> > > >> > > > > > > > Ganglia works just in case hadoop is not > > > running, > > > > > so > > > > > >> > > metrics > > > > > >> > > >> > are > > > > > >> > > >> > > > not > > > > > >> > > >> > > > > > > > publshed to gmetad node (conf with new > > > > > >> > > >> > > > hadoop-metrics2.proprieties). > > > > > >> > > >> > > > > > When > > > > > >> > > >> > > > > > > > hadoop is started, a segmentation fault > > appears > > > > in > > > > > >> > gmetad > > > > > >> > > >> > deamon: > > > > > >> > > >> > > > > > > > > > > > > >> > > >> > > > > > > > sudo gmetad -d 2 > > > > > >> > > >> > > > > > > > ....... > > > > > >> > > >> > > > > > > > Updating host xxx, metric > > > > > >> dfs.FSNamesystem.BlocksTotal > > > > > >> > > >> > > > > > > > Updating host xxx, metric bytes_in > > > > > >> > > >> > > > > > > > Updating host xxx, metric bytes_out > > > > > >> > > >> > > > > > > > Updating host xxx, metric > > > > > >> > > >> > > > > metricssystem.MetricsSystem.publish_max_time > > > > > >> > > >> > > > > > > > Created rrd > > > > > >> > > >> > > > > > > > > > > > > >> > > >> > > > > > > > > > > > >> > > >> > > > > > > > > > > >> > > >> > > > > > > > > > >> > > >> > > > > > > > > >> > > >> > > > > > > > >> > > >> > > > > > > >> > > >> > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > /var/lib/ganglia/rrds/hdcluster/hadoopmaster/metricssystem.MetricsSystem.publish_max_time.rrd > > > > > >> > > >> > > > > > > > Segmentation fault > > > > > >> > > >> > > > > > > > > > > > > >> > > >> > > > > > > > And some info from the apache log < > > > > > >> > > >> > http://pastebin.com/nrqKRtKJ > > > > > >> > > >> > > >.. > > > > > >> > > >> > > > > > > > > > > > > >> > > >> > > > > > > > Can someone suggest a ganglia version that > is > > > > > tested > > > > > >> > with > > > > > >> > > >> > hadoop > > > > > >> > > >> > > > > > > 0.20.205? > > > > > >> > > >> > > > > > > > I will try to sort it out however it seems > a > > > not > > > > so > > > > > >> > > tribial > > > > > >> > > >> > > > problem.. > > > > > >> > > >> > > > > > > > > > > > > >> > > >> > > > > > > > Thank you > > > > > >> > > >> > > > > > > > > > > > > >> > > >> > > > > > > > > > > > > >> > > >> > > > > > > > > > > > > >> > > >> > > > > > > > > > > > > >> > > >> > > > > > > > > > > > > >> > > >> > > > > > > > On 2 December 2011 12:32, praveenesh kumar > < > > > > > >> > > >> > praveen...@gmail.com > > > > > >> > > >> > > > > > > > > >> > > >> > > > > > wrote: > > > > > >> > > >> > > > > > > > > > > > > >> > > >> > > > > > > >> or Do I have to apply some hadoop patch > for > > > > this ? > > > > > >> > > >> > > > > > > >> > > > > > >> > > >> > > > > > > >> Thanks, > > > > > >> > > >> > > > > > > >> Praveenesh > > > > > >> > > >> > > > > > > >> > > > > > >> > > >> > > > > > > > > > > > > >> > > >> > > > > > > > > > > > > >> > > >> > > > > > > > > > > > >> > > >> > > > > > > > > > > >> > > >> > > > > > > > > > >> > > >> > > > > > > > > >> > > >> > > > > > > > >> > > >> > > > > > > >> > > >> > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > -- > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > http://www.hadoopsummit.org/ > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > -- > > > > > >> > > > > > > >> > > > > > > >> > http://www.hadoopsummit.org/ > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > > http://www.hadoopsummit.org/ > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > http://www.hadoopsummit.org/ > > > > > > > > > > > > > > > > > > > > -- > > > > > > http://www.hadoopsummit.org/ > > >