Brian, Can you be a bit more specific about the work you did at LANL ? Thanks, Rich
On 2/1/09 2:20 PM, "Brian Barrett" <brbar...@open-mpi.org> wrote: > While I would love to be involved in this change, as I believe it's > critical it get done right and have some reservations based on the > work we did while a bunch of us were still at LANL, I just don't have > time for yet another weekly telecon (particularly since 2:00 MST is > the same as an existing weekly telecon). > > I still think my objections stand, however. A weekly telecon to > discuss the issues is no replacement for a detailed explanation of how > things are going to work, as well as some proof of concept code. We > should hold this change up to the same standard we hold all major > changes to -- which means a working temp branch with negligible > performance impact. > > Brian > > On Feb 1, 2009, at 12:14 PM, Graham, Richard L. wrote: > >> > Brian, >> > Just fyi, there is a weekly call - thursdays at 4 est where we have >> > been discussyng these issues. >> > Let's touch base at the forum. >> > >> > Rich >> > >> > ----- Original Message ----- >> > From: devel-boun...@open-mpi.org <devel-boun...@open-mpi.org> >> > To: Open MPI Developers <de...@open-mpi.org> >> > Sent: Sun Feb 01 10:36:33 2009 >> > Subject: Re: [OMPI devel] RFC: Move of ompi_bitmap_t >> > >> > In that case, I remove my objection to this particular RFC. It >> > remains for all other RFCs related to moving any of the BTL move code >> > to the trunk before the critical issues with the BTL move have been >> > sorted out in a temporary branch. This includes renaming functions >> > and such. Perhaps we should have a discussion about those issues >> > during the Forum in a couple weeks? >> > >> > Brian >> > >> > On Feb 1, 2009, at 5:37 AM, Jeff Squyres wrote: >> > >>> >> I just looked through both opal_bitmap_t and ompi_bitmap_t and I >>> >> think that the only real difference is that in the ompi version, we >>> >> check (in various places) that the size of the bitmap never grows >>> >> beyond OMPI_FORTRAN_HANDLE_MAX; the opal version doesn't do these >>> >> kind of size checks. >>> >> >>> >> I think it would be fairly straightforward to: >>> >> >>> >> - add generic checks into the opal version, perhaps by adding a new >>> >> API call (opal_bitmap_set_max_size()) >>> >> - if the max size has been set, then ensure that the bitmap never >>> >> grows beyond that size, otherwise let it have the same behavior as >>> >> today (grow without bound -- assumedly until malloc() fails) >>> >> >>> >> It'll take a little care to ensure to merge the functionality >>> >> correctly, but it is possible. Once that is done, you can: >>> >> >>> >> - remove the ompi_bitmap_t class >>> >> - s/ompi_bitmap/opal_bitmap/g in the OMPI layer >>> >> - add new calls to opal_bitmap_set_max_size(&bitmap, >>> >> OMPI_FORTRAN_HANDLE_MAX) in the OMPI layer (should only be in a few >>> >> places -- probably one for each MPI handle type...? It's been so >>> >> long since I've looked at that code that I don't remember offhand) >>> >> >>> >> I'd generally be in favor of this because, although this is not a >>> >> lot of repeated code, it *is* repeated code -- so cleaning it up and >>> >> consolidating the non-Fortran stuff down in opal is not a Bad Thing. >>> >> >>> >> >>> >> On Jan 30, 2009, at 4:59 PM, Ralph Castain wrote: >>> >> >>>> >>> The history is simple. Originally, there was one bitmap_t in orte >>>> >>> that was also used in ompi. Then the folks working on Fortran found >>>> >>> that they had to put a limit in the bitmap code to avoid getting >>>> >>> values outside of Fortran's range. However, this introduced a >>>> >>> problem - if we had the limit in the orte version, then we limited >>>> >>> ourselves unnecessarily, and introduced some abstraction questions >>>> >>> since orte knows nothing about Fortran. >>>> >>> >>>> >>> So two were created. Then the orte_bitmap_t was blown away at a >>>> >>> later time when we removed the GPR as George felt it wasn't >>>> >>> necessary (which was true). It was later reborn when we needed it >>>> >>> in the routed system, but this time it was done in opal as others >>>> >>> indicated a potential more general use for that capability. >>>> >>> >>>> >>> The problem with uniting the two is that you either have to >>>> >>> introduce Fortran-based limits into opal (which messes up the non- >>>> >>> ompi uses), or deal with the Fortran limits in some other fashion. >>>> >>> Neither is particularly pleasant, though it could be done. >>>> >>> >>>> >>> I think it primarily is a question for the Fortran folks to address >>>> >>> - can they deal with Fortran limits in some other manner without >>>> >>> making the code unmanageable and/or taking a performance hit? >>>> >>> >>>> >>> Ralph >>>> >>> >>>> >>> >>>> >>> On Jan 30, 2009, at 2:40 PM, Richard Graham wrote: >>>> >>> >>>>> >>>> This should really be viewed as a code maintenance RFC. The >>>>> >>>> reason this >>>>> >>>> came up in the first place is because we are investigating the btl >>>>> >>>> move, but >>>>> >>>> these are really two very distinct issues. There are two bits of >>>>> >>>> code that >>>>> >>>> have virtually the same functionality - they do have the same >>>>> >>>> interface I am >>>>> >>>> told. The question is, is there a good reason to keep two >>>>> >>>> different >>>>> >>>> versions in the repository ? Not knowing the history of why a >>>>> >>>> second >>>>> >>>> version was created this is an inquiry. Is there some performance >>>>> >>>> advantage, or some other advantage to having these two versions ? >>>>> >>>> >>>>> >>>> Rich >>>>> >>>> >>>>> >>>> >>>>> >>>> On 1/30/09 3:23 PM, "Terry D. Dontje" <terry.don...@sun.com> wrote: >>>>> >>>> >>>>>> >>>>> I second Brian's concern. So unless this is just an announcement >>>>>> >>>>> that >>>>>> >>>>> this is being done on a tmp branch only until everything is in >>>>>> >>>>> order I >>>>>> >>>>> think we need further discussions. >>>>>> >>>>> >>>>>> >>>>> --td >>>>>> >>>>> >>>>>> >>>>> Brian Barrett wrote: >>>>>>> >>>>>> So once again, I bring up my objection of this entire line of >>>>>>> >>>>>> moving >>>>>>> >>>>>> until such time as the entire process is properly mapped out. I >>>>>>> >>>>>> believe it's premature to being moving around code in >>>>>>> >>>>>> preparation for >>>>>>> >>>>>> a move that hasn't been proven viable yet. Until there is >>>>>>> >>>>>> concrete >>>>>>> >>>>>> evidence that such a move is possible, won't degrade application >>>>>>> >>>>>> performance, and does not make the code totally unmaintainable, I >>>>>>> >>>>>> believe that any related code changes should not be brought into >>>>>>> >>>>>> the >>>>>>> >>>>>> trunk. >>>>>>> >>>>>> >>>>>>> >>>>>> Brian >>>>>>> >>>>>> >>>>>>> >>>>>> >>>>>>> >>>>>> On Jan 30, 2009, at 12:30 PM, Rainer Keller wrote: >>>>>>> >>>>>> >>>>>>>> >>>>>>> On behalf of Laurent Broto >>>>>>>> >>>>>>> >>>>>>>> >>>>>>> RFC: Move of ompi_bitmap_t >>>>>>>> >>>>>>> >>>>>>>> >>>>>>> WHAT: Move ompi_bitmap_t into opal or onet-layer >>>>>>>> >>>>>>> >>>>>>>> >>>>>>> WHY: Remove dependency on ompi-layer. >>>>>>>> >>>>>>> >>>>>>>> >>>>>>> WHERE: ompi/class >>>>>>>> >>>>>>> >>>>>>>> >>>>>>> WHEN: Open MPI-1.4 >>>>>>>> >>>>>>> >>>>>>>> >>>>>>> TIMEOUT: February 3, 2009. >>>>>>>> >>>>>>> >>>>>>>> >>>>>>> ------------------------------------- >>>>>>>> >>>>>>> Details: >>>>>>>> >>>>>>> WHY: >>>>>>>> >>>>>>> The ompi_bitmap_t is being used in various places within >>>>>>>> >>>>>>> opal/orte/ompi. With >>>>>>>> >>>>>>> the proposed splitting of BTLs into a separate library, we are >>>>>>>> >>>>>>> currently >>>>>>>> >>>>>>> investigating several of the differences between ompi/class/* >>>>>>>> >>>>>>> and >>>>>>>> >>>>>>> opal/class/* >>>>>>>> >>>>>>> >>>>>>>> >>>>>>> One of the items is the ompi_bitmap_t which is quite similar to >>>>>>>> >>>>>>> the >>>>>>>> >>>>>>> opal_bitmap_t. >>>>>>>> >>>>>>> The question is, whether we can remove favoring a solution just >>>>>>>> >>>>>>> in opal. >>>>>>>> >>>>>>> >>>>>>>> >>>>>>> WHAT: >>>>>>>> >>>>>>> The data structures in the opal-version are the same, >>>>>>>> >>>>>>> so is the interface, >>>>>>>> >>>>>>> the implementation is *almost* the same.... >>>>>>>> >>>>>>> >>>>>>>> >>>>>>> The difference is the Fortran handles ;-]! >>>>>>>> >>>>>>> >>>>>>>> >>>>>>> Maybe we're missing something but could we have a discussion, >>>>>>>> >>>>>>> on why >>>>>>>> >>>>>>> Fortran >>>>>>>> >>>>>>> sizes are playing a role here, and if this is a hard >>>>>>>> >>>>>>> requirement, how >>>>>>>> >>>>>>> we could >>>>>>>> >>>>>>> settle that into that current interface (possibly without a >>>>>>>> >>>>>>> notion of >>>>>>>> >>>>>>> Fortran, >>>>>>>> >>>>>>> but rather, set some upper limit that the bitmap may grow to?) >>>>>>>> >>>>>>> >>>>>>>> >>>>>>> With best regards, >>>>>>>> >>>>>>> Laurent and Rainer >>>>>>>> >>>>>>> -- >>>>>>>> >>>>>>> >>>>>>>> ------------------------------------------------------------------------ >>>>>>>> >>>>>>> Rainer Keller, PhD Tel: (865) 241-6293 >>>>>>>> >>>>>>> Oak Ridge National Lab Fax: (865) 241-4811 >>>>>>>> >>>>>>> PO Box 2008 MS 6164 Email: kel...@ornl.gov >>>>>>>> >>>>>>> Oak Ridge, TN 37831-2008 AIM/Skype: rusraink >>>>>>>> >>>>>>> >>>>>>>> >>>>>>> >>>>>>>> >>>>>>> >>>>>>>> >>>>>>> >>>>>>>> >>>>>>> >>>>>>>> >>>>>>> >>>>>>>> >>>>>>> _______________________________________________ >>>>>>>> >>>>>>> devel mailing list >>>>>>>> >>>>>>> de...@open-mpi.org >>>>>>>> >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>>> >>>>> _______________________________________________ >>>>>> >>>>> devel mailing list >>>>>> >>>>> de...@open-mpi.org >>>>>> >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> >>>> >>>>> >>>> _______________________________________________ >>>>> >>>> devel mailing list >>>>> >>>> de...@open-mpi.org >>>>> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> >>> >>>> >>> _______________________________________________ >>>> >>> devel mailing list >>>> >>> de...@open-mpi.org >>>> >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >> >>> >> >>> >> -- >>> >> Jeff Squyres >>> >> Cisco Systems >>> >> >>> >> _______________________________________________ >>> >> devel mailing list >>> >> de...@open-mpi.org >>> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >> >> > >> > _______________________________________________ >> > devel mailing list >> > de...@open-mpi.org >> > http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > >> > _______________________________________________ >> > devel mailing list >> > de...@open-mpi.org >> > http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > > > -- > Brian Barrett > Open MPI developer > http://www.open-mpi.org/ > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >