I appreciate the discussion. Seth has given this some thought, and I wanted to reply below:
Seth Graham wrote: > Jesse Becker wrote: > >> Bernard Li wrote: >> >> >>> While not exactly what you have in mind, but have you taken a look at >>> the JobMonarch project? >>> >>> https://subtrac.sara.nl/oss/jobmonarch/ >>> >>> AFAIK it does also work with SGE. >>> >> Meh...not really. It's under development, and doesn't work so well >> with the 6.x versions. I think it works with the old 5.3 series though. >> > > job monarch reveals a flaw with PBS (what we use, I imagine this isn't a > unique trait) in the sense that the worker nodes do not have the > capacity to report job information. Job monarch can only run on the > central server.. which makes ganglia, due to its distributed nature, a > poor partner. > > If I'm understanding the goal of the Thebes project, they would try to > get the authors of batch system software to adopt a more "ganglia like" > approach to reporting statistics.. which I'd be happy to see. > The more I discuss the place Ganglia fits in with Thebes, the more I'm convinced that the only real change would be to add a few more metrics. I would want to create a grid metascheduler that reads XML metrics from the gmetad's, and I would want to encourage people running gmetad's to connect to at least one other gmetad, to create a distributed pool of resources. > The concern I have is overloading the metrics held in memory by a gmond > process to the point it starts consuming noticeable amounts of > resources. Ganglia's xml output is by far my favorite feature of the > program, the xml is easy to parse and use for homegrown monitoring > tools. I worry that if ganglia became a default dumping ground for > service information the xml would become inconvenient to work with. > Given the above, I would want to limit how much service information gets added. Frankly, I can only think of a handful of metrics (I think Jess and I came up with 5 in a one-hour conversation) I would want to add. The metascheduler would only do grid match-making. Also, if the trend towards using VM's in the grid continues, there may really only be one or two metrics to add. > The other downside I see is the nature of the data itself. Job > information from a batch system is not something you can stuff into an > RRD.. you'd have to develop some other way to store job history > information, a task that would put a greater load on gmetad and > introduce additional scalability concerns. My second favorite feature of > ganglia is how simple it is, and I don't know if I'd appreciate it the > same way if an installation had a dependency tree as long as my arm. > I don't really think I'd want to necessary add any additional RRD's. All I really want is to use the existing XML and transport system in ganglia. Any changes or additions to the web interface or the RRD's would be considered out of scope. Once the metascheduler uses the ganglia XML to make a match, all further negotiations and reporting should be done between the resource and the user, ganglia shouldn't be involved. At least, that's what I think right now. By the way, all this opens up authn and authz issues, which are out of scope of this mailing list, but information is available on the website for those who are interested. Thanks again for the discussion. > > > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > _______________________________________________ > Ganglia-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/ganglia-general > ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _______________________________________________ Ganglia-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ganglia-general

