Re: [OMPI devel] IOF repair

Ralph Castain Thu, 10 Jul 2008 19:24:25 -0400

Just an update. Jeff and I have completed and checked in a fix to this
problem (see the trunk, r18873). Please note that this fix has only been
lightly tested, and we don't know for certain that it hasn't opened another
hole somewhere else in the dyke.


We would appreciate it if people could test this to the extent possible over
the next few days. Please let us know (good or bad) so we can decide whether
or not to move it to the 1.3 release branch.

Thanks
Ralph



On 7/10/08 9:29 AM, "Jeff Squyres" <jsquy...@cisco.com> wrote:

> Ya, no worries -- we're working on a fix.  We're just debating exactly
> *how* to fix it.  See https://svn.open-mpi.org/trac/ompi/ticket/1135
> if you want to keep up with the conversation.
> 
> 
> On Jul 10, 2008, at 11:20 AM, Bogdan Costescu wrote:
> 
>> On Wed, 9 Jul 2008, Ralph Castain wrote:
>> 
>>> stdin is read twice if rank=0 shares the node with mpirun
>> 
>> I consider this to be a very serious regression. Many Fortran
>> scientific programs (at least many that I know) read their input
>> from stdin. This comes as a result of them being (or started to be)
>> written many years ago with Fortran77 for which AFAIK there is no
>> defined way of handling command line parameters, so reading from
>> stdin is a convenient and portable way to put some data into the
>> program as this is known to be open already and at a well known I/O
>> unit.
>> 
>> I just spent 2 days trying to understand why one such program
>> (CHARMM) which worked fine for many MPI implementations on many
>> platforms including the stable 1.2 series on this very cluster
>> suddenly stops in some step related to processing input. After
>> reading your message, everything makes sense...
>> 
>>> Alternatively, we could ship 1.3 as-is, and warn users (similar to
>>> 1.2) that
>>> they should avoiding reading from stdin if there is any chance that
>>> rank=0
>>> could be co-located with mpirun. Note that most of our clusters do
>>> not allow
>>> such co-location - but it is permitted by default by OMPI.
>> 
>> I don't know what setup your clusters have, but most that I have
>> seen, including all those that I admin, do run mpirun/mpiexec and
>> rank=0 on the same node. I really think that this will bite a lot of
>> people.
>> 
>> -- 
>> Bogdan Costescu
>> 
>> IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany
>> Phone: +49 6221 54 8869/8240, Fax: +49 6221 54 8868/8850
>> E-mail: bogdan.coste...@iwr.uni-heidelberg.de
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

Re: [OMPI devel] IOF repair

Reply via email to