https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80822

--- Comment #3 from Nathan Weeks <weeks at iastate dot edu> ---
Setting OMP_DISPLAY_ENV=verbose results in the following output with Intel
17.0.2:

================================================================================
OPENMP DISPLAY ENVIRONMENT BEGIN
   _OPENMP='201511'
  [host] KMP_ABORT_DELAY='0'
  [host] KMP_ADAPTIVE_LOCK_PROPS='1,1024'
  [host] KMP_ALIGN_ALLOC='64'
  [host] KMP_ALL_THREADPRIVATE='256'
  [host] KMP_ALL_THREADS='2147483647'
  [host] KMP_ATOMIC_MODE='2'
  [host] KMP_BLOCKTIME='200'
  [host] KMP_CPUINFO_FILE: value is not defined
  [host] KMP_DETERMINISTIC_REDUCTION='FALSE'
  [host] KMP_DISP_NUM_BUFFERS='7'
  [host] KMP_DUPLICATE_LIB_OK='FALSE'
  [host] KMP_FORCE_REDUCTION: value is not defined
  [host] KMP_FOREIGN_THREADS_THREADPRIVATE='TRUE'
  [host] KMP_FORKJOIN_BARRIER='2,2'
  [host] KMP_FORKJOIN_BARRIER_PATTERN='hyper,hyper'
  [host] KMP_FORKJOIN_FRAMES='TRUE'
  [host] KMP_FORKJOIN_FRAMES_MODE='3'
  [host] KMP_GTID_MODE='3'
  [host] KMP_HANDLE_SIGNALS='FALSE'
  [host] KMP_HOT_TEAMS_MAX_LEVEL='1'
  [host] KMP_HOT_TEAMS_MODE='0'
  [host] KMP_INIT_AT_FORK='TRUE'
  [host] KMP_INIT_WAIT='2048'
  [host] KMP_ITT_PREPARE_DELAY='0'
  [host] KMP_LIBRARY='throughput'
  [host] KMP_LOCK_KIND='queuing'
  [host] KMP_MALLOC_POOL_INCR='1M'
  [host] KMP_NEXT_WAIT='1024'
  [host] KMP_NUM_LOCKS_IN_BLOCK='1'
  [host] KMP_PLAIN_BARRIER='2,2'
  [host] KMP_PLAIN_BARRIER_PATTERN='hyper,hyper'
  [host] KMP_REDUCTION_BARRIER='1,1'
  [host] KMP_REDUCTION_BARRIER_PATTERN='hyper,hyper'
  [host] KMP_SCHEDULE='static,balanced;guided,iterative'
  [host] KMP_SETTINGS='FALSE'
  [host] KMP_SPIN_BACKOFF_PARAMS='4096,100'
  [host] KMP_STACKOFFSET='64'
  [host] KMP_STACKPAD='0'
  [host] KMP_STACKSIZE='4M'
  [host] KMP_STORAGE_MAP='FALSE'
  [host] KMP_TASKING='2'
  [host] KMP_TASK_STEALING_CONSTRAINT='1'
  [host] KMP_USER_LEVEL_MWAIT='FALSE'
  [host] KMP_VERSION='FALSE'
  [host] KMP_WARNINGS='TRUE'
  [host] OMP_CANCELLATION='FALSE'
  [host] OMP_DEFAULT_DEVICE='0'
  [host] OMP_DISPLAY_ENV='VERBOSE'
  [host] OMP_DYNAMIC='FALSE'
  [host] OMP_MAX_ACTIVE_LEVELS='2147483647'
  [host] OMP_MAX_TASK_PRIORITY='0'
  [host] OMP_NESTED='FALSE'
  [host] OMP_NUM_THREADS='32'
  [host] OMP_PLACES='threads'
  [host] OMP_PROC_BIND='spread'
  [host] OMP_SCHEDULE='static'
  [host] OMP_STACKSIZE='4M'
  [host] OMP_THREAD_LIMIT='2147483647'
  [host] OMP_WAIT_POLICY='PASSIVE'
  [host]
KMP_AFFINITY='noverbose,warnings,respect,granularity=thread,noduplicates,compact,0,0'
OPENMP DISPLAY ENVIRONMENT END
================================================================================

For comparison, the Cray 8.5.4 OpenMP runtime (which produces the same thread
affinity as the Intel 17.0.2 OpenMP runtime in the aforementioned example)
outputs the following when OMP_DISPLAY_ENV=verbose:

================================================================================
OPENMP DISPLAY ENVIRONMENT BEGIN
  _OPENMP='201307'
  OMP_SCHEDULE='static,0'
  OMP_NUM_THREADS='32'
  OMP_DYNAMIC='TRUE'
  OMP_NESTED='FALSE'
  OMP_STACKSIZE='128MB'
  OMP_WAIT_POLICY='ACTIVE'
  OMP_MAX_ACTIVE_LEVELS='1023'
  OMP_THREAD_LIMIT='256'
  CRAY_OMP_CHECK_AFFINITY='FALSE'
  OMP_PROC_BIND='spread'
  OMP_PLACES='threads'
  OMP_CANCELLATION='FALSE'
  OMP_DISPLAY_ENV='VERBOSE'
  OMP_DEFAULT_DEVICE='0'
  CRAY_OMP_GUARD_SIZE='0B'
  CRAY_OMP_TASK_Q_LIMIT='256'
  CRAY_OMP_CONTENTION_POLICY='Automatic'
OPENMP DISPLAY ENVIRONMENT END
================================================================================

Also, in this environment, with OMP_NUM_THREADS=2 OMP_PLACES=threads
OMP_PROC_BIND=close, the libgomp affinity results in both threads being pinned
to different sockets:

================================================================================
$ OMP_NUM_THREADS=2 OMP_PLACES=threads OMP_PROC_BIND=close ./xthi-omp.gnu |
sort -k 4n,4n
Hello from thread 0, on nid00015. (core affinity = 0)
Hello from thread 1, on nid00015. (core affinity = 1)
================================================================================

Both the Intel and Cray OpenMP runtimes pin the threads to the same physical
core:

================================================================================
$ OMP_NUM_THREADS=2 OMP_PLACES=threads OMP_PROC_BIND=close ./xthi-omp.intel |
sort -k 4n,4n
Hello from thread 0, on nid00015. (core affinity = 0)
Hello from thread 1, on nid00015. (core affinity = 32)
================================================================================

It does seem that the OpenMP 4.5 specification can be interpreted to support
the libgomp behavior (e.g., p. 52 lines 33-38), though it at least seems
counterintuitive.

Reply via email to