Greetings! On Thu, Oct 11, 2012 at 9:07 AM, Alain Roy <alain....@pobox.com> wrote: >> Instead, I propose we have a new tool - "condor_pool_summary" - which >> addresses the needs of sysadmins > > > I agree! When I was doing work for the OSG summer school, I was frustrated > by the difficulty in getting information about our pool. And I have similar > frustrations with condor_q. > > I'll add to the list of things I'd like to see: I'd like some information > about negotiation as well. For instance, we had a long cycle (about 10 > minutes), where 50% of it was spent timing out against a bunch of > unresponsive schedds. The only insight I had into that was the > NegotiatorLog, which isn't user friendly. I'm not quite sure how to best > present useful information about negotiation, but I'm sure that propagating > this information through your proposed tool (or perhaps another similar > tool?) would help people understand what's going on in their pools better.
So, I agree with everything stated as well, but if you're introducing a new tool I think you should consider starting with a better submitter-facing tool first and then give administrators and submitters a better shared diagnostic view within that. What we've found that matters most to the users of our system is "how quickly are my jobs getting picked up and completed?" - i.e. throughput. Our workflows also tend to be deadline-oriented - our users want to submit a set of work, see the work enter the system, and then either see a few jobs pick up and extrapolate a throughput so they can estimate completion or be assured at least of some level of fairness and be given an operational metric to give them confidence in that fairness (I'm using 95% of my expected allocation of resources). In a many-schedd environment, there isn't any way today to get either of those views without extensive development of additional services and client-facing tools. We had a number of those custom tools previous to using Condor, and we've built up many more as part of using and scaling condor. We'd be happy to discuss what we've needed to do around all that, and have done so 1-on-1 with many folks already. Previously it infered a particular operational practice that we weren't sure was generalized, but it certainly feels relevant to this discussion. This is why during CondorWeek I said I'd like to see some of the new negotiator and schedd ad stats also introduced into the submitter ad. Additional information (like priority, fairshare allocations, and concurrency limits...) should also be introduced to the ad so that there's enough there to infer expected throughput rates. You could then have a better client-facing tool to report to users and operators how effective their throughput is. And then, if their throughput is lower than expected people will ask "is the overall pool utilization and throughput where it should be?" - and there should be a single source of truth between submitters and administrators on that as well. So by all means, let's get a commonplace utilization script in - but could we also get something to help create shared understanding between submitters and administrators as well? -- Lans Carstensen _______________________________________________ Condor-devel mailing list Condor-devel@cs.wisc.edu https://lists.cs.wisc.edu/mailman/listinfo/condor-devel