On 2/1/2012 5:20 AM, Brice Goglin wrote:
Le 01/02/2012 03:49, Christopher Samuel a écrit :
With XLC and 1.3.1 and 1.4 I get plenty of warnings (compile logs for
both attached) whilst compiling and then 4 failures in make check
(accompanied with segmentation faults):
samuel@tambo:~/HWLOC/hwloc-1.3.1> grep -B1 FAIL: log
/bin/sh: line 1: 5267 Segmentation fault ${dir}$tst
FAIL: hwloc_bind
/bin/sh: line 1: 5285 Segmentation fault ${dir}$tst
FAIL: hwloc_get_last_cpu_location
/bin/sh: line 1: 5335 Segmentation fault ${dir}$tst
FAIL: hwloc_is_thissystem
/bin/sh: line 1: 5481 Segmentation fault ${dir}$tst
FAIL: glibc-sched
All these tests involved binding, which is likely broken (see below).
"/vlsci/VLSCI/samuel/HWLOC/hwloc-1.3.1/include/hwloc.h", line 1203.28:
1506-1385 (W) The attribute "pure" is not a valid type attribute.
CC traversal.lo
Attribute pure is before the function name, I'll move it after, XLC
doesn't seems to warn in this case.
"distances.c", line 62.42: 1506-404 (W) restrict can only qualify a
pointer type.
"distances.c", line 84.50: 1506-404 (W) restrict can only qualify a
pointer type.
"distances.c", line 226.40: 1506-404 (W) restrict can only qualify a
pointer type.
XLC may be wrong here, topology_t is typedef'ed to a pointer...
I've seen this sort of thing before where "pointerness" was ignored when
"inside" the typedef.
Since this is only a warning, and a missing "restrict" should not impact
correctness, I vote to ignore this.
"topology-linux.c", line 303.33: 1506-280 (W) Function argument
assignment between types "unsigned int" and "struct {...}*" is not allowed.
"topology-linux.c", line 303.27: 1506-098 (E) Missing argument(s).
"topology-linux.c", line 391.32: 1506-280 (W) Function argument
assignment between types "unsigned int" and "struct {...}*" is not allowed.
"topology-linux.c", line 391.26: 1506-098 (E) Missing argument(s).
"topology-linux.c", line 715.40: 1506-280 (W) Function argument
assignment between types "unsigned int" and "struct {...}*" is not allowed.
"topology-linux.c", line 715.34: 1506-098 (E) Missing argument(s).
"topology-linux.c", line 807.40: 1506-280 (W) Function argument
assignment between types "unsigned int" and "struct {...}*" is not allowed.
"topology-linux.c", line 807.34: 1506-098 (E) Missing argument(s).
This looks very bad. It means something screwed the already very complex
sched_setaffinity detection code.
Does XLC redefine its own sched_setaffinity functions? Can you find the
relevant header file and send it?
PGI had similar problems at some point. That's very annoying.
This explains why binding tests broke.
I cannot find any instances within the /opt/apps/ibm tree on this machine:
$ find /opt/apps/ibm -name \*.h|xargs grep affi
find: `/opt/apps/ibm/vac/11.1/lap/license': Permission denied
find: `/opt/apps/ibm/essl/5.1/lap/license': Permission denied
find: `/opt/apps/ibm/xlf/13.1/lap/license': Permission denied
/opt/apps/ibm/xlsmp/2.1/include/omp.h: ibm_sched_affinity= 1000/*
AFFINITY scheduling type. This is an IBM extension. */
$ find /opt/apps/ibm -name \*.h|xargs grep cpu_set_t
find: `/opt/apps/ibm/vac/11.1/lap/license': Permission denied
find: `/opt/apps/ibm/essl/5.1/lap/license': Permission denied
find: `/opt/apps/ibm/xlf/13.1/lap/license': Permission denied
The generated config.h contains:
#define HWLOC_HAVE_OLD_SCHED_SETAFFINITY 1
#define HWLOC_HAVE_SCHED_SETAFFINITY 1
The "OLD" sched_setaffinity is the 2-argument version, but
/usr/include/sched.h contains the 3-argument version:
extern int sched_setaffinity (__pid_t __pid, size_t __cpusetsize,
__const cpu_set_t *__cpuset) __THROW;
So, it would appear that configure has wrongly set
"HWLOC_HAVE_OLD_SCHED_SETAFFINITY".
Examining config.log I find
configure:9046: checking for old prototype of sched_setaffinity
configure:9064: xlc -c conftest.c >&5
"conftest.c", line 82.19: 1506-236 (W) Macro name _GNU_SOURCE has been
redefined.
"conftest.c", line 82.19: 1506-358 (I) "_GNU_SOURCE" is defined on
line 25 of conftest.c.
"conftest.c", line 89.23: 1506-280 (W) Function argument assignment
between types "unsigned long" and "void*" is not allowed.
"conftest.c", line 89.19: 1506-098 (E) Missing argument(s).
configure:9064: $? = 0
configure:9068: result: yes
This is WRONG.
The compiler has reported an error: "(E) Missing argument(s)" and yet
exited with $? = 0
I am looking at xlc docs to see if there is some compiler flag to be set.
-Paul
--
Paul H. Hargrove phhargr...@lbl.gov
Future Technologies Group
HPC Research Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900