Yo Ralph --
Is the "bad" grpcomm component both new and the default? Further, is
the old "basic" grpcomm component now the non-default / testing
component?
If so, I wonder if what happened was that Pasha did an "svn up", but
without re-running autogen/configure, he wouldn't have seen the new
"bad" component and therefore was falling back on the old "basic"
component that is now the non-default / testing component...?
On Jun 19, 2008, at 4:21 PM, Pavel Shamis (Pasha) wrote:
I did fresh check out and everything works well.
So looks like some svn up screw my svn.
Ralph, thanks for help !
Ralph H Castain wrote:
Hmmm...something isn't right, Pasha. There is simply no way you
should be
encountering this error. You are picking up the wrong grpcomm module.
I went ahead and fixed the grpcomm/basic module, but as I note in
the commit
message, that is now an experimental area. The grpcomm/bad module
is the
default for that reason.
Check to ensure you have the orte/mca/grpcomm/bad directory, and
that it is
getting built. My guess is that you have a corrupted checkout or
build and
that the component is either missing or not getting built.
On 6/19/08 1:37 PM, "Pavel Shamis (Pasha)"
<pa...@dev.mellanox.co.il> wrote:
Ralph H Castain wrote:
I can't find anything wrong so far. I'm waiting in a queue on
Odin to try
there since Jeff indicated you are using rsh as a launcher, and
that's the
only access I have to such an environment. Guess Odin is being
pounded
because the queue isn't going anywhere.
I use ssh., here is command line:
./bin/mpirun -np 2 -H sw214,sw214 -mca btl openib,sm,self
./osu_benchmarks-3.0/osu_latency
Meantime, I'm building on RoadRunner and will test there (TM
enviro).
On 6/19/08 1:18 PM, "Pavel Shamis (Pasha)" <pa...@dev.mellanox.co.il
> wrote:
You'll have to tell us something more than that, Pasha. What
kind of
environment, what rev level were you at, etc.
Ahh, sorry :) I run on Linux x86_64 Sles10 sp1. (Open MPI)
1.3a1r18682M
, OFED 1.3.1
Pasha.
So far as I know, the trunk is fine.
On 6/19/08 12:01 PM, "Pavel Shamis (Pasha)" <pa...@dev.mellanox.co.il
>
wrote:
I tried to run trunk on my machines and I got follow error:
[sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would
read past
end of buffer in file base/grpcomm_base_modex.c at line 451
[sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would
read past
end of buffer in file grpcomm_basic_module.c at line 560
[sw214:04365]
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel
process is
likely to abort. There are many reasons that a parallel
process can
fail during MPI_INIT; some of which are due to configuration or
environment
problems. This failure appears to be an internal failure;
here's some
additional information (which may only be relevant to an Open
MPI
developer):
orte_grpcomm_modex failed
--> Returned "Data unpack would read past end of
buffer" (-26) instead
of "Success" (0)
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
--
Jeff Squyres
Cisco Systems