Dear Yuki and Takahiro,

Thanks for the bug report and for the patch. I pushed a [nearly identical] 
patch in the trunk in https://svn.open-mpi.org/trac/ompi/changeset/25488. A 
special version for the 1.4 has been prepared and has been attached to the 
ticket #2916 (https://svn.open-mpi.org/trac/ompi/ticket/2916).

  Thanks,
  george.


On Nov 14, 2011, at 02:27 , Y.MATSUMOTO wrote:

> Dear Open MPI community,
> 
> I'm a member of MPI library development team in Fujitsu,
> Takahiro Kawashima, who sent mail before, is my colleague.
> We start to feed back.
> 
> First, we fixed about MPI_LB/MPI_UB and data packing problem.
> 
> Program crashes when it meets all of the following conditions:
> a: The type of sending data is contiguous and derived type.
> b: Either or both of MPI_LB and MPI_UB is used in the data type.
> c: The size of sending data is smaller than extent(Data type has gap).
> d: Send-count is bigger than 1.
> e: Total size of data is bigger than "eager limit"
> 
> This problem occurs in attachment C program.
> 
> An incorrect-address accessing occurs
> because an unintended value of "done" inputs and
> the value of "max_allowd" becomes minus
> in the following place in "ompi/datatype/datatype_pack.c(in version 1.4.3)".
> 
> 
> (ompi/datatype/datatype_pack.c)
> 188             packed_buffer = (unsigned char *) iov[iov_count].iov_base;
> 189             done = pConv->bConverted - i * pData->size;  /* partial data 
> from last pack */
> 190             if( done != 0 ) {  /* still some data to copy from the last 
> time */
> 191                 done = pData->size - done;
> 192                 OMPI_DDT_SAFEGUARD_POINTER( user_memory, done, 
> pConv->pBaseBuf, pData, pConv->count );
> 193                 MEMCPY_CSUM( packed_buffer, user_memory, done, pConv );
> 194                 packed_buffer += done;
> 195                 max_allowed -= done;
> 196                 total_bytes_converted += done;
> 197                 user_memory += (extent - pData->size + done);
> 198             }
> 
> This program assumes "done" as the size of partial data from last pack.
> However, when the program crashes, "done" equals the sum of all transmitted 
> data size.
> It makes "max_allowed" to be a negative value.
> 
> We modified the code as following and it passed our test suite.
> But we are not sure this fix is correct. Can anyone review this fix?
> Patch (against Open MPI 1.4 branch) is attached to this mail.
> 
> -            if( done != 0 ) {  /* still some data to copy from the last time 
> */
> +            if( (done + max_allowed) >= pData->size ) {  /* still some data 
> to copy from the last time */
> 
> Best regards,
> 
> Yuki MATSUMOTO
> MPI development team,
> Fujitsu
> 
> (2011/06/28 10:58), Takahiro Kawashima wrote:
>> Dear Open MPI community,
>> 
>> I'm a member of MPI library development team in Fujitsu. Shinji
>> Sumimoto, whose name appears in Jeff's blog, is one of our bosses.
>> 
>> As Rayson and Jeff noted, K computer, world's most powerful HPC system
>> developed by RIKEN and Fujitsu, utilizes Open MPI as a base of its MPI
>> library. We, Fujitsu, are pleased to announce that, and also have special
>> thanks to Open MPI community.
>> We are sorry to be late announce!
>> 
>> Our MPI library is based on Open MPI 1.4 series, and has a new point-
>> to-point component (BTL) and new topology-aware collective communication
>> algorithms (COLL). Also, it is adapted to our runtime environment (ESS,
>> PLM, GRPCOMM etc).
>> 
>> K computer connects 68,544 nodes by our custom interconnect.
>> Its runtime environment is our proprietary one. So we don't use orted.
>> We cannot tell start-up time yet because of disclosure restriction, sorry.
>> 
>> We are surprised by the extensibility of Open MPI, and have proved that
>> Open MPI is scalable to 68,000 processes level! We feel pleasure to
>> utilize such a great open-source software.
>> 
>> We cannot tell detail of our technology yet because of our contract
>> with RIKEN AICS, however, we will plan to feedback of our improvements
>> and bug fixes. We can contribute some bug fixes soon, however, for
>> contribution of our improvements will be next year with Open MPI
>> agreement.
>> 
>> Best regards,
>> 
>> MPI development team,
>> Fujitsu
>> 
>> 
>>> I got more information:
>>> 
>>>    http://blogs.cisco.com/performance/open-mpi-powers-8-petaflops/
>>> 
>>> Short version: yes, Open MPI is used on K and was used to power the 8PF 
>>> runs.
>>> 
>>> w00t!
>>> 
>>> 
>>> 
>>> On Jun 24, 2011, at 7:16 PM, Jeff Squyres wrote:
>>> 
>>>> w00t!
>>>> 
>>>> OMPI powers 8 petaflops!
>>>> (at least I'm guessing that -- does anyone know if that's true?)
>>>> 
>>>> 
>>>> On Jun 24, 2011, at 7:03 PM, Rayson Ho wrote:
>>>> 
>>>>> Interesting... page 11:
>>>>> 
>>>>> http://www.fujitsu.com/downloads/TC/sc10/programming-on-k-computer.pdf
>>>>> 
>>>>> Open MPI based:
>>>>> 
>>>>> * Open Standard, Open Source, Multi-Platform including PC Cluster.
>>>>> * Adding extension to Open MPI for "Tofu" interconnect
>>>>> 
>>>>> Rayson
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
> 
> <ub_lb.patch><tp_lb_ub_ng.c>_______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to