thanks for patch, we will review it next week.

Also, you can select different shmem collectives at runtime:

-mca scoll_mpi_enable 1 (to select MPI collectives for shmem)



On Sat, May 10, 2014 at 7:08 PM, Bert Wesarg <bert.wes...@tu-dresden.de>wrote:

> On 05/10/2014 02:46 PM, Bert Wesarg wrote:
>
>> Hi,
>>
>> I get a deadlock when using the shmem_collect32() routine and any of the
>> non-root PEs pass 0 as the number of elements. It looks like the
>> algorithm in _algorithm_central_collector() does use 0 as a special
>> value, and thus does not break out of the loop.
>>
>
> This seems to fix it for me:
>
> diff --git i/oshmem/mca/scoll/basic/scoll_basic_collect.c
> w/oshmem/mca/scoll/basic/scoll_basic_collect.c
> index aa81fac..6bba7d1 100644 oshmem/mca/scoll/basic/scoll_basic_collect.c
> --- i/oshmem/mca/scoll/basic/scoll_basic_collect.c
> +++ w/oshmem/mca/scoll/basic/scoll_basic_collect.c
> @@ -553,7 +553,7 @@ static int _algorithm_central_collector(struct
> oshmem_group_t *group,
>          wait_pe_array = malloc(sizeof(*wait_pe_array) * wait_pe_count);
>          if (wait_pe_array) {
>              memset((void*) wait_pe_array,
> -                   0,
> +                   0xff,
>                     sizeof(*wait_pe_array) * wait_pe_count);
>              wait_pe_array[0] = nlong;
>              wait_pe_count--;
> @@ -564,13 +564,13 @@ static int _algorithm_central_collector(struct
> oshmem_group_t *group,
>                                group->my_pe);
>                  for (i = 1; (i < group->proc_count) && (rc ==
> OSHMEM_SUCCESS);
>                          i++) {
> -                    if (wait_pe_array[i] == 0) {
> +                    if (wait_pe_array[i] == (size_t)-1) {
>                          pe_cur = oshmem_proc_pe(group->proc_array[i]);
>                          value = 0;
>                          rc = MCA_SPML_CALL(get((void*)pSync,
> sizeof(value), (void*)&value, pe_cur));
>                          if ((rc == OSHMEM_SUCCESS)
>                                  && (value != _SHMEM_SYNC_VALUE)
> -                                && (value > 0)) {
> +                                && (value >= 0)) {
>                              wait_pe_array[i] = (size_t) value;
>                              wait_pe_count--;
>                              SCOLL_VERBOSE(14,
> @@ -588,6 +588,9 @@ static int _algorithm_central_collector(struct
> oshmem_group_t *group,
>
>              for (i = 1; (i < group->proc_count) && (rc == OSHMEM_SUCCESS);
>                      i++) {
> +                if (!wait_pe_array[i])
> +                    continue;
> +
>                  /* Get PE ID of a peer from the group */
>                  pe_cur = oshmem_proc_pe(group->proc_array[i]);
>
>
>> Kind regards,
>> Bert Wesarg
>>
>>
> --
> Dipl.-Inf. Bert Wesarg
> wiss. Mitarbeiter
>
> Technische Universität Dresden
> Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH)
> 01062 Dresden
> Tel.: +49 (351) 463-42451
> Fax: +49 (351) 463-37773
> E-Mail: bert.wes...@tu-dresden.de
>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/05/14768.php
>

Reply via email to