If avoiding epoll() makes Josh's problems go away, PLEASE let me know
because that might indicate a deficiency in BLCR that I would want to
address.

-Paul

Brian W. Barrett wrote:
> Jeff / George -
> 
> Did you add a way to specify which event modules are used?  Because epoll 
> pushs the socket list into the kernel, I can see how it would screw up 
> BLCR.  I bet everything would work if we forced the use of poll / select.
> 
> Brian
> 
> On Tue, 18 Mar 2008, Jeff Squyres wrote:
> 
>> Crud, ok.  Keep us posted.
>>
>> On Mar 18, 2008, at 4:16 PM, Josh Hursey wrote:
>>
>>> I'm testing with checkpoint/restart and the new libevent seems to be
>>> messing up the checkpoints generated by BLCR. I'll be taking a look
>>> at it over the next couple of days, but just thought I'd let people
>>> know. Unfortunately I don't have any more details at the moment.
>>>
>>> -- Josh
>>>
>>> On Mar 17, 2008, at 2:50 PM, Jeff Squyres wrote:
>>>
>>>> WHAT: Bring new version of libevent to the trunk.
>>>>
>>>> WHY: Newer version, slightly better performance (lower overheads /
>>>> lighter weight), properly integrate the use of epoll and other
>>>> scalable fd monitoring mechanisms.
>>>>
>>>> WHERE: 98% of the changes are in opal/event; there's a few changes to
>>>> configury and one change to the orted.
>>>>
>>>> TIMEOUT: COB, Friday, 21 March 2008
>>>>
>>>> DESCRIPTION:
>>>>
>>>> George/UTK has done the bulk of the work to integrate a new version
>>>> of
>>>> libevent on the following tmp branch:
>>>>
>>>>     https://svn.open-mpi.org/svn/ompi/tmp-public/libevent-merge
>>>>
>>>> ** WE WOULD VERY MUCH APPRECIATE IF PEOPLE COULD MTT TEST THIS
>>>> BRANCH!
>>>> **
>>>>
>>>> Cisco ran MTT on this branch on Friday and everything checked out
>>>> (i.e., no more failures than on the trunk).  We just made a few more
>>>> minor changes today and I'm running MTT again now, but I'm not
>>>> expecting any new failures (MTT will take several hours).  We would
>>>> like to bring the new libevent in over this upcoming weekend, but
>>>> would very much appreciate if others could test on their platforms
>>>> (Cisco tests mainly 64 bit RHEL4U4).  This new libevent *should* be a
>>>> fairly side-effect free change, but it is possible that since we're
>>>> now using epoll and other scalable fd monitoring tools, we'll run
>>>> into
>>>> some unanticipated issues on some platforms.
>>>>
>>>> Here's a consolidated diff if you want to see the changes:
>>>>
>>>> https://svn.open-mpi.org/trac/ompi/changeset?old_path=tmp-public%
>>>> 2Flibevent-merge&old=17846&new_path=trunk&new=17842
>>>>
>>>> Thanks.
>>>>
>>>> --
>>>> Jeff Squyres
>>>> Cisco Systems
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Paul H. Hargrove                          phhargr...@lbl.gov
Future Technologies Group
HPC Research Department                   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

Reply via email to