Hmmm...something isn't right, Pasha. There is simply no way you should be
encountering this error. You are picking up the wrong grpcomm module.

I went ahead and fixed the grpcomm/basic module, but as I note in the commit
message, that is now an experimental area. The grpcomm/bad module is the
default for that reason.

Check to ensure you have the orte/mca/grpcomm/bad directory, and that it is
getting built. My guess is that you have a corrupted checkout or build and
that the component is either missing or not getting built.


On 6/19/08 1:37 PM, "Pavel Shamis (Pasha)" <pa...@dev.mellanox.co.il> wrote:

> Ralph H Castain wrote:
>> I can't find anything wrong so far. I'm waiting in a queue on Odin to try
>> there since Jeff indicated you are using rsh as a launcher, and that's the
>> only access I have to such an environment. Guess Odin is being pounded
>> because the queue isn't going anywhere.
>>   
>  I use ssh., here is command line:
> ./bin/mpirun -np 2 -H sw214,sw214 -mca btl openib,sm,self
> ./osu_benchmarks-3.0/osu_latency
>> Meantime, I'm building on RoadRunner and will test there (TM enviro).
>> 
>> 
>> On 6/19/08 1:18 PM, "Pavel Shamis (Pasha)" <pa...@dev.mellanox.co.il> wrote:
>> 
>>   
>>>> You'll have to tell us something more than that, Pasha. What kind of
>>>> environment, what rev level were you at, etc.
>>>>   
>>>>       
>>> Ahh, sorry :) I run on Linux x86_64 Sles10 sp1. (Open MPI) 1.3a1r18682M
>>> , OFED 1.3.1
>>> Pasha.
>>>     
>>>> So far as I know, the trunk is fine.
>>>> 
>>>> 
>>>> On 6/19/08 12:01 PM, "Pavel Shamis (Pasha)" <pa...@dev.mellanox.co.il>
>>>> wrote:
>>>> 
>>>>   
>>>>       
>>>>> I tried to run trunk on my machines and I got follow error:
>>>>> 
>>>>> [sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would read past
>>>>> end of buffer in file base/grpcomm_base_modex.c at line 451
>>>>> [sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would read past
>>>>> end of buffer in file grpcomm_basic_module.c at line 560
>>>>> [sw214:04365]
>>>>> --------------------------------------------------------------------------
>>>>> It looks like MPI_INIT failed for some reason; your parallel process is
>>>>> likely to abort.  There are many reasons that a parallel process can
>>>>> fail during MPI_INIT; some of which are due to configuration or
>>>>> environment
>>>>> problems.  This failure appears to be an internal failure; here's some
>>>>> additional information (which may only be relevant to an Open MPI
>>>>> developer):
>>>>> 
>>>>>   orte_grpcomm_modex failed
>>>>>   --> Returned "Data unpack would read past end of buffer" (-26) instead
>>>>> of "Success" (0)
>>>>> 
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>     
>>>>>         
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> 
>>>>   
>>>>       
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>     
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>>   
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to