Hi Folks,

I'm looking for tips, tricks and tools to get at node utilization to
optimize our cluster.  I want answer questions like:
- what nodes ran a particular job?
- how long did it take for those nodes to run the tasks for that job?
- how/why did Hadoop pick those nodes to begin with?

More detailed questions like
- how much memory did the task for the job use on that node?
- average CPU load on that node during the task run

And more aggregate questions like:
- are some nodes favored more than others?
- utilization averages (generally, how many cores on that node are in use, etc.)

There are plenty more that I'm not asking, but you get the point?  So,
what are you guys using for this?

I see some mentions of Ganglia, so I'll definitely look into that.
Anything else?  Anything you're using to monitor in real-time (like a
'top' across the nodes or something like that)?

Any info or war-stories greatly appreciated.

Thanks,

Tom

Reply via email to