If avoiding epoll() makes Josh's problems go away, PLEASE let me know because that might indicate a deficiency in BLCR that I would want to address.
-Paul Brian W. Barrett wrote: > Jeff / George - > > Did you add a way to specify which event modules are used? Because epoll > pushs the socket list into the kernel, I can see how it would screw up > BLCR. I bet everything would work if we forced the use of poll / select. > > Brian > > On Tue, 18 Mar 2008, Jeff Squyres wrote: > >> Crud, ok. Keep us posted. >> >> On Mar 18, 2008, at 4:16 PM, Josh Hursey wrote: >> >>> I'm testing with checkpoint/restart and the new libevent seems to be >>> messing up the checkpoints generated by BLCR. I'll be taking a look >>> at it over the next couple of days, but just thought I'd let people >>> know. Unfortunately I don't have any more details at the moment. >>> >>> -- Josh >>> >>> On Mar 17, 2008, at 2:50 PM, Jeff Squyres wrote: >>> >>>> WHAT: Bring new version of libevent to the trunk. >>>> >>>> WHY: Newer version, slightly better performance (lower overheads / >>>> lighter weight), properly integrate the use of epoll and other >>>> scalable fd monitoring mechanisms. >>>> >>>> WHERE: 98% of the changes are in opal/event; there's a few changes to >>>> configury and one change to the orted. >>>> >>>> TIMEOUT: COB, Friday, 21 March 2008 >>>> >>>> DESCRIPTION: >>>> >>>> George/UTK has done the bulk of the work to integrate a new version >>>> of >>>> libevent on the following tmp branch: >>>> >>>> https://svn.open-mpi.org/svn/ompi/tmp-public/libevent-merge >>>> >>>> ** WE WOULD VERY MUCH APPRECIATE IF PEOPLE COULD MTT TEST THIS >>>> BRANCH! >>>> ** >>>> >>>> Cisco ran MTT on this branch on Friday and everything checked out >>>> (i.e., no more failures than on the trunk). We just made a few more >>>> minor changes today and I'm running MTT again now, but I'm not >>>> expecting any new failures (MTT will take several hours). We would >>>> like to bring the new libevent in over this upcoming weekend, but >>>> would very much appreciate if others could test on their platforms >>>> (Cisco tests mainly 64 bit RHEL4U4). This new libevent *should* be a >>>> fairly side-effect free change, but it is possible that since we're >>>> now using epoll and other scalable fd monitoring tools, we'll run >>>> into >>>> some unanticipated issues on some platforms. >>>> >>>> Here's a consolidated diff if you want to see the changes: >>>> >>>> https://svn.open-mpi.org/trac/ompi/changeset?old_path=tmp-public% >>>> 2Flibevent-merge&old=17846&new_path=trunk&new=17842 >>>> >>>> Thanks. >>>> >>>> -- >>>> Jeff Squyres >>>> Cisco Systems >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900