Me and a colleague of mine (Ryan Greenhall) setup Ganglia on our hadoop cluster, he has written a summary of what we did to get it to work, you might find it useful:
http://forwardtechnology.co.uk/blog/4cc841609f4e6a021100004f Regards, Abhinay Mehta On 8 November 2010 15:31, Jonathan Creasy <[email protected]>wrote: > This is the correct configuration, and there should be nothing more needed. > I don't think that these configuration changes will take affect on the fly > so you would need to restart the datanode and namenode processes if I > understand correctly. > > When you browse your you will see some more metrics: > > dfs.FSDirectory.files_deleted > dfs.FSNamesystem.BlockCapacity > dfs.FSNamesystem.BlocksTotal > dfs.FSNamesystem.CapacityRemainingGB > dfs.FSNamesystem.CapacityTotalGB > dfs.FSNamesystem.CapacityUsedGB > dfs.FSNamesystem.CorruptBlocks > dfs.FSNamesystem.ExcessBlocks > dfs.FSNamesystem.FilesTotal > dfs.FSNamesystem.MissingBlocks > dfs.FSNamesystem.PendingDeletionBlocks > dfs.FSNamesystem.PendingReplicationBlocks > dfs.FSNamesystem.ScheduledReplicationBlocks > dfs.FSNamesystem.TotalLoad > dfs.FSNamesystem.UnderReplicatedBlocks > dfs.datanode.blockChecksumOp_avg_time > dfs.datanode.blockChecksumOp_num_ops > dfs.datanode.blockReports_avg_time > dfs.datanode.blockReports_num_ops > dfs.datanode.block_verification_failures > dfs.datanode.blocks_read > dfs.datanode.blocks_removed > dfs.datanode.blocks_replicated > dfs.datanode.blocks_verified > dfs.datanode.blocks_written > dfs.datanode.bytes_read > dfs.datanode.bytes_written > dfs.datanode.copyBlockOp_avg_time > dfs.datanode.copyBlockOp_num_ops > dfs.datanode.heartBeats_avg_time > dfs.datanode.heartBeats_num_ops > dfs.datanode.readBlockOp_avg_time > dfs.datanode.readBlockOp_num_ops > dfs.datanode.readMetadataOp_avg_time > dfs.datanode.readMetadataOp_num_ops > dfs.datanode.reads_from_local_client > dfs.datanode.reads_from_remote_client > dfs.datanode.replaceBlockOp_avg_time > dfs.datanode.replaceBlockOp_num_ops > dfs.datanode.writeBlockOp_avg_time > dfs.datanode.writeBlockOp_num_ops > dfs.datanode.writes_from_local_client > dfs.datanode.writes_from_remote_client > dfs.namenode.AddBlockOps > dfs.namenode.CreateFileOps > dfs.namenode.DeleteFileOps > dfs.namenode.FileInfoOps > dfs.namenode.FilesAppended > dfs.namenode.FilesCreated > dfs.namenode.FilesRenamed > dfs.namenode.GetBlockLocations > dfs.namenode.GetListingOps > dfs.namenode.JournalTransactionsBatchedInSync > dfs.namenode.SafemodeTime > dfs.namenode.Syncs_avg_time > dfs.namenode.Syncs_num_ops > dfs.namenode.Transactions_avg_time > dfs.namenode.Transactions_num_ops > dfs.namenode.blockReport_avg_time > dfs.namenode.blockReport_num_ops > dfs.namenode.fsImageLoadTime > jvm.metrics.gcCount > jvm.metrics.gcTimeMillis > jvm.metrics.logError > jvm.metrics.logFatal > jvm.metrics.logInfo > jvm.metrics.logWarn > jvm.metrics.maxMemoryM > jvm.metrics.memHeapCommittedM > jvm.metrics.memHeapUsedM > jvm.metrics.memNonHeapCommittedM > jvm.metrics.memNonHeapUsedM > jvm.metrics.threadsBlocked > jvm.metrics.threadsNew > jvm.metrics.threadsRunnable > jvm.metrics.threadsTerminated > jvm.metrics.threadsTimedWaiting > jvm.metrics.threadsWaiting > rpc.metrics.NumOpenConnections > rpc.metrics.RpcProcessingTime_avg_time > rpc.metrics.RpcProcessingTime_num_ops > rpc.metrics.RpcQueueTime_avg_time > rpc.metrics.RpcQueueTime_num_ops > rpc.metrics.abandonBlock_avg_time > rpc.metrics.abandonBlock_num_ops > rpc.metrics.addBlock_avg_time > rpc.metrics.addBlock_num_ops > rpc.metrics.blockReceived_avg_time > rpc.metrics.blockReceived_num_ops > rpc.metrics.blockReport_avg_time > rpc.metrics.blockReport_num_ops > rpc.metrics.callQueueLen > rpc.metrics.complete_avg_time > rpc.metrics.complete_num_ops > rpc.metrics.create_avg_time > rpc.metrics.create_num_ops > rpc.metrics.getEditLogSize_avg_time > rpc.metrics.getEditLogSize_num_ops > rpc.metrics.getProtocolVersion_avg_time > rpc.metrics.getProtocolVersion_num_ops > rpc.metrics.register_avg_time > rpc.metrics.register_num_ops > rpc.metrics.rename_avg_time > rpc.metrics.rename_num_ops > rpc.metrics.renewLease_avg_time > rpc.metrics.renewLease_num_ops > rpc.metrics.rollEditLog_avg_time > rpc.metrics.rollEditLog_num_ops > rpc.metrics.rollFsImage_avg_time > rpc.metrics.rollFsImage_num_ops > rpc.metrics.sendHeartbeat_avg_time > rpc.metrics.sendHeartbeat_num_ops > rpc.metrics.versionRequest_avg_time > rpc.metrics.versionRequest_num_ops > > -Jonathan > > On Nov 8, 2010, at 8:34 AM, Shuja Rehman wrote: > > > Hi > > I have cluster of 4 machines and want to configure ganglia for monitoring > > purpose. I have read the wiki and add the following lines to > > hadoop-metrics.properties on each machine. > > > > dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext > > dfs.period=10 > > dfs.servers=10.10.10.2:8649 > > > > mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext > > mapred.period=10 > > mapred.servers=10.10.10.2:8649 > > > > jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext > > jvm.period=10 > > jvm.servers=10.10.10.2:8649 > > > > rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext > > rpc.period=10 > > rpc.servers=10.10.10.2:8649 > > > > > > where 10.10.10.2 is the machine where i am running gmeated and web front > > end. Will I need to same ip in all machine as i do here or need to give > > machine own ip in each file? and is there anything more to do to setup it > > with hadoop? > > > > > > > > -- > > Regards > > Shuja-ur-Rehman Baig > > <http://pk.linkedin.com/in/shujamughal> > >
