Ralph,Looking at PTP, the only thing we need is to query the process information (PID, rank, node) when the job is created. Perhaps if only queries are allowed from callbacks then recursion would be eliminated?
If you can get this functionality into your new interface and back in the trunk, I take a look at porting PTP to use it.
Thanks, Greg On Mar 4, 2008, at 6:14 PM, Ralph Castain wrote:
Yeah, the problem we had in the past was:1. something would trigger in the system - e.g., a particular job state was reached. This would cause us to execute a callback function via the GPR2. the callback function would take some action. Typically, this involved sending out a message or calling another function. Either way, the eventual result of that action would be to cause another GPR trigger to fire - eitherthe job or a process changing stateThis loop would continue ad infinitum. Sometimes, I would see stack traces hundreds of calls deep. Debugging and maintaining something that intertwinedwas impossible.People tried to impose order by establishing rules about what could andcould not be called from various situations, but that also provedintractable. Problem was that we could get it to work for a "normal" codepath, but all the variety of failure modes, combined with all theflexibility built into the code base, created so many code paths that youinevitably wound up deadlocked under some corner case conditions. Which we generally agreed was unacceptable. It -is- possible to have callback functions that avoid this situation.However, it is very easy to make a mistake and "hang" the whole system. Justseemed easier to avoid the entire problem. (I don't get that option!) The ability to get an allocation without launching is easy to add.I/O forwarding is currently an issue. Our IOF doesn't seem to like it when I try to create an "alternate" tap (the default always goes back through the persistent orted, so the tool looks like a second "tap" on the flow). This is noted as a "bug" on our tracker, and I expect it will be addressed priorto releasing 1.3. I will ask that it be raised in priority.I'll review what I had done and see about bringing it into the trunk by theend of the week. Ralph On 3/4/08 4:00 PM, "Greg Watson" <g.wat...@computer.org> wrote:I don't have a problem using a different interface, assuming it'sadequately supported and provides the functionality we need. I presumethe recursive behavior you're referring to is calling OMPI interfaces from the callback functions. Any event-based system has this issue, and it is usually solved by clearly specifying the allowable interfaces that can be called (possibly none). Since PTP doesn't call OMPI functions from callbacks, it's not a problem for us if no interfaces can be called. The major missing features appear to be: - Ability to request a process allocation without launching the job - I/O forwarding callbacksWithout these, PTP support will be so limited that I'd be reluctant tosay we support OMPI. Greg On Mar 4, 2008, at 4:50 PM, Ralph H Castain wrote:It is buried deep-down in the thread, but I'll just reiterate it here. I have "restored" the ability to "subscribe" to changes in job, proc, and node state via OMPI's tool interface library. I have -not- checked this into the trunk yet, though, until the community has a chance to consider whether or not it wants it. Restoring the ability to have such changes "callback" to user functions raises the concern again about recursive behavior. We worked hard to remove recursion from the code base, and it would be a concern to see it potentially re-enter. I realize there is some difference between ORTE calling back into itself vs calling back into a user-specified function. However, unless that user truly understands ORTE/OMPI and takes considerable precautions, it is very easy to recreate the recursive behavior without intending to do so. The tool interface library was built to accomplish two things: 1. help reduce the impact on external tools of changes to ORTE/OMPI interfaces, and 2. provide a degree of separation to prevent the tool from inadvertently causing OMPI to "behave badly" I think we accomplished that - I would encourage you to at least consider using the library. If there is something missing, we can always add it. Ralph On 3/4/08 2:37 PM, "Jeff Squyres" <jsquy...@cisco.com> wrote:Greg -- I admit to being a bit puzzled here. Ralph sent around RFCs about these changes many months ago. Everyone said they didn't want this functionality -- it was seen as excess functionality that Open MPI didn't want or need -- so it was all removed. As such, I have to agree with Ralph that it is an "enhancement" to re-add the functionality. That being said, patches are always welcome! IBM has signed the OMPI 3rd party contribution agreement, so it couldbe contributed directly. Sidenote: I was also under the impression that PTP was being re- geared towards STCI and moving away from ORTE anyway. Is this incorrect? On Mar 4, 2008, at 3:24 PM, Greg Watson wrote:Hi all, Ralph informs me that significant functionality has been removed from ORTE in 1.3. Unfortunately this functionality was being used by PTP toprovide support for OMPI, and without it, it seems unlikely that PTPwill be able to work with 1.3. Apparently restoring this lost functionality is an "enhancement" of 1.3, and so is something that will not necessarily be done. Having worked with OMPI from a veryearly stage to ensure that we were able to provide robust support, Imust say it is a bit disappointing that this approach is being taken. I hope that the community will view this "enhancement" as worthwhile. Regards, Greg Begin forwarded message:On 2/29/08 7:13 AM, "Gregory R Watson" <g...@us.ibm.com> wrote:Ralph Castain <r...@lanl.gov> wrote on 02/29/2008 12:18:39 AM:Ralph Castain <r...@lanl.gov> 02/29/08 12:18 AM To Gregory R Watson/Watson/IBM@IBMUS cc Subject Re: OpenMPI changes Hi Greg All of the prior options (and some new ones) for spawning a jobare fullysupported in the new interface. Instead of setting them with"attributes",you create an orte_job_t object and just fill them in. This isprecisely howmpirun does it - you can look at that code if you want anexample, though itis somewhat complex. Alternatively, you can look at the way it isdone forcomm_spawn, which may be more analogous to your situation - thatcode is inompi/mca/dpm/orte. All the tools library does is communicate the job object to thetargetpersistent daemon so it can do the work. This way, you don't haveto openall the frameworks, deal directly with the plm interface, etc.Alternatively, you are welcome to do a full orte_init and use theframeworksyourself - there is no requirement to use the library. I onlyoffer it as analternative.As far as I can tell, neither API provides the same functionalityas thatavailable in 1.2. While this might be beneficial for OMPI- specificactivities,the changes appear to severely limit the interaction of tools withtheruntime. At this point, I can't see either interface supporting PTP.I went ahead and added a notification capability to the system - took about 30 minutes. I can provide notice of job and process state changes since I see those. Node state changes, however, are different - I can notify on them, but we have no way of seeing them. None of the environments we support tell us when a node fails.I know that the tool library works because it uses the identicalAPIs ascomm_spawn and mpirun. I have also tested them by building my owntools.There's a big difference being on a code path that *must* workbecause it isused by core components, to one that is provided as an add-on forexternaltools. I may be worrying needlessly if this new interface becomes an "officially supported" API. Is that planned? At a minimum, itseems like it'sgoing to complicate your testing process, since you're going toneed toprovide a separate set of tests that exercise this interfaceindependent ofthe rest of OMPI.It is an officially supported API. Testing is not as big a problemas you might expect since the library exercises the same code paths as mpirun andcomm_spawn. Like I said, I have written my own tools that exercisethe library - no problem using them as tests.We do not launch an orted for any tool-library query. All we do iscommunicate the query to the target persistent daemon or mpirun.Thoseentities have recv's posted to catch any incoming messages andexecute therequest.You are correct that we no longer have event driven notificationin thesystem. I repeatedly asked the community (on both devel and corelists) forinput on that question, and received no indications that anyonewanted itsupported. It can be added back into the system, but wouldrequire theapproval of the OMPI community. I don't know how problematic thatwould be -there is a lot of concern over the amount of memory, overhead,and potentialreliability issues that surround event notification. If you wantthatcapability, I suggest we discuss it, come up with a plan thatdeals withthose issues, and then take a proposal to the devel list fordiscussion.As for reliability, the objectives of the last year's effort werepreciselyscalability and reliability. We did a lot of work to eliminaterecursivedeadlocks and improve the reliability of the code. Our currenttestingindicates we had considerable success in that regard,particularly with therecursion elimination commit earlier today.I would be happy to work with you to meet the PTP's needs - we'lljust needto work with the OMPI community to ensure everyone buys into theplan. If itwould help, I could come and review the new arch with the team (Ialreadygave a presentation on it to IBM Rochester MN) and discuss required enhancements.PTP's needs have not changed since 1.0. From our perspective, the1.3 branchsimply removes functionality that is required for PTP to supportOMPI. Itseems strange that we need "approval of the OMPI community" tocontinue to usefunctionality that has been available since 1.0. In any case,there areunfortunately no resources to work on the kind of re-engineeringthat appearsto be required to support 1.3, even if it did provide thefunctionality weneed.Afraid I have to be driven by the OMPI community's requirements since they pay my salary :-) What they need is a "lean, mean, OMPI machine" as they say, and (for some reason) they view the debugger community as consisting offolks like totalview, vampirtrace, etc. - all of whom get involved(either directly or via one of the OMPI members) in the requirements discussions. Can't argue with business decisions, though. I gather there was some mention of PTP at the recent LANL/IBM RR meeting, so I'll let people know that PTP won't be an option on RR.And I'll see if there is any interest here in adding 1.3 support toPTP ourselves - from looking at your code, I think it would take about a day, assuming someone more familiar with PTP will work with me. Take care RalphGreg_______________________________________________ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel_______________________________________________ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel_______________________________________________ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel_______________________________________________ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel