On Mar 8, 2010, at 6:43 AM, Jeff Squyres wrote: > On Mar 7, 2010, at 8:13 PM, Ralph Castain wrote: > >> > How about calling it --enable-opal-event-progress-thread, or even >> > --enable-open-libevent-progress-thread? >> >> Why not add another 100+ characters to the name while we are at it? :-/ >> > > :-) > > I didn't really think the length mattered here, since it's a configure > argument. There has been a *lot* of confusion over the name of this > particular switch over the past few years, so I'm suggesting that a longer, > more descriptive name might be a little better. Just my $0.02...
I honestly don't think that is the source of the confusion. The revised name tells you exactly what that configure option does - it enables a thread at the opal layer that calls opal_progress. Period. The confusion is over how that is used within the code, given that opal doesn't have any communication system (as George pointed out). So having an opal progress thread running will cause the event library to tick over, but that does....? It isn't directly tied to any existing subsystem, but rather cuts across any of them that are sitting on sockets/file descriptors etc. using the event library. If you look at the other progress threads in the system (e.g., openib), you'll find that they don't use the event library to monitor their fd's - they poll them directly. So enabling the opal progress thread doesn't directly affect them. So I would say let's leave the name alone, and change it if/when someone figures out how to utilize that capability. > >> > The openib BTL can have up to 2 progress threads (!) -- the async verbs >> > event notifier and the RDMA CM agent. They really should be consolidated. >> > If there's infrastructure to consolidate them via opal or something else, >> > then so much the better... >> >> Agreed, though I think that is best done as a separate effort from this RFC. >> > > Agreed -- sorry, I wasn't clear. I wasn't trying to propose that that work > be added to this RFC; I was just trying to mention that there could be a good > use for the work from this RFC if such infrastructure was provided. > >> I believe there was a concern over latency if all the BTLs are driven by one >> progress thread that sequentially runs across their respective file >> descriptors, but I may be remembering it incorrectly... >> > > > FWIW, I believe the openib progress threads were written the they way they > were (i.e., without any opal progress thread support) because, at least in > the current setup, to get the opal progress thread support, you have to turn > on all the heavyweight locks, etc. These two progress threads now simply > pthread_create() and do minimal locking between the main thread and > themselves, without affecting the rest of the locking machinery in the code > base. > > I'm not saying that's perfect (or even good); I'm just saying that that's the > way it currently is. Indeed, at a minimum, Pasha and I have long talked > about merging these two progress threads into 1. It would be even better if > we could merge these two project threads into some other infrastructure. But > it's always been somewhat of a low priority; we've never gotten a round > tuit... > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel