On 05/10/2014 02:46 PM, Bert Wesarg wrote:
Hi,

I get a deadlock when using the shmem_collect32() routine and any of the
non-root PEs pass 0 as the number of elements. It looks like the
algorithm in _algorithm_central_collector() does use 0 as a special
value, and thus does not break out of the loop.

This seems to fix it for me:

diff --git i/oshmem/mca/scoll/basic/scoll_basic_collect.c w/oshmem/mca/scoll/basic/scoll_basic_collect.c
index aa81fac..6bba7d1 100644 oshmem/mca/scoll/basic/scoll_basic_collect.c
--- i/oshmem/mca/scoll/basic/scoll_basic_collect.c
+++ w/oshmem/mca/scoll/basic/scoll_basic_collect.c
@@ -553,7 +553,7 @@ static int _algorithm_central_collector(struct oshmem_group_t *group,
         wait_pe_array = malloc(sizeof(*wait_pe_array) * wait_pe_count);
         if (wait_pe_array) {
             memset((void*) wait_pe_array,
-                   0,
+                   0xff,
                    sizeof(*wait_pe_array) * wait_pe_count);
             wait_pe_array[0] = nlong;
             wait_pe_count--;
@@ -564,13 +564,13 @@ static int _algorithm_central_collector(struct oshmem_group_t *group,
                               group->my_pe);
for (i = 1; (i < group->proc_count) && (rc == OSHMEM_SUCCESS);
                         i++) {
-                    if (wait_pe_array[i] == 0) {
+                    if (wait_pe_array[i] == (size_t)-1) {
                         pe_cur = oshmem_proc_pe(group->proc_array[i]);
                         value = 0;
rc = MCA_SPML_CALL(get((void*)pSync, sizeof(value), (void*)&value, pe_cur));
                         if ((rc == OSHMEM_SUCCESS)
                                 && (value != _SHMEM_SYNC_VALUE)
-                                && (value > 0)) {
+                                && (value >= 0)) {
                             wait_pe_array[i] = (size_t) value;
                             wait_pe_count--;
                             SCOLL_VERBOSE(14,
@@ -588,6 +588,9 @@ static int _algorithm_central_collector(struct oshmem_group_t *group,

             for (i = 1; (i < group->proc_count) && (rc == OSHMEM_SUCCESS);
                     i++) {
+                if (!wait_pe_array[i])
+                    continue;
+
                 /* Get PE ID of a peer from the group */
                 pe_cur = oshmem_proc_pe(group->proc_array[i]);


Kind regards,
Bert Wesarg


--
Dipl.-Inf. Bert Wesarg
wiss. Mitarbeiter

Technische Universität Dresden
Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH)
01062 Dresden
Tel.: +49 (351) 463-42451
Fax: +49 (351) 463-37773
E-Mail: bert.wes...@tu-dresden.de

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to