Yo Ralph --

Is the "bad" grpcomm component both new and the default? Further, is the old "basic" grpcomm component now the non-default / testing component?

If so, I wonder if what happened was that Pasha did an "svn up", but without re-running autogen/configure, he wouldn't have seen the new "bad" component and therefore was falling back on the old "basic" component that is now the non-default / testing component...?


On Jun 19, 2008, at 4:21 PM, Pavel Shamis (Pasha) wrote:

I did fresh check out and everything works well.
So looks like some svn up screw my svn.
Ralph, thanks for help !

Ralph H Castain wrote:
Hmmm...something isn't right, Pasha. There is simply no way you should be
encountering this error. You are picking up the wrong grpcomm module.

I went ahead and fixed the grpcomm/basic module, but as I note in the commit message, that is now an experimental area. The grpcomm/bad module is the
default for that reason.

Check to ensure you have the orte/mca/grpcomm/bad directory, and that it is getting built. My guess is that you have a corrupted checkout or build and
that the component is either missing or not getting built.


On 6/19/08 1:37 PM, "Pavel Shamis (Pasha)" <pa...@dev.mellanox.co.il> wrote:


Ralph H Castain wrote:

I can't find anything wrong so far. I'm waiting in a queue on Odin to try there since Jeff indicated you are using rsh as a launcher, and that's the only access I have to such an environment. Guess Odin is being pounded
because the queue isn't going anywhere.

I use ssh., here is command line:
./bin/mpirun -np 2 -H sw214,sw214 -mca btl openib,sm,self
./osu_benchmarks-3.0/osu_latency

Meantime, I'm building on RoadRunner and will test there (TM enviro).


On 6/19/08 1:18 PM, "Pavel Shamis (Pasha)" <pa...@dev.mellanox.co.il > wrote:


You'll have to tell us something more than that, Pasha. What kind of
environment, what rev level were you at, etc.

Ahh, sorry :) I run on Linux x86_64 Sles10 sp1. (Open MPI) 1.3a1r18682M
, OFED 1.3.1
Pasha.

So far as I know, the trunk is fine.


On 6/19/08 12:01 PM, "Pavel Shamis (Pasha)" <pa...@dev.mellanox.co.il >
wrote:


I tried to run trunk on my machines and I got follow error:

[sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would read past
end of buffer in file base/grpcomm_base_modex.c at line 451
[sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would read past
end of buffer in file grpcomm_basic_module.c at line 560
[sw214:04365]
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or
environment
problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI
developer):

 orte_grpcomm_modex failed
--> Returned "Data unpack would read past end of buffer" (-26) instead
of "Success" (0)

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Jeff Squyres
Cisco Systems

Reply via email to