Hi everyone,

I am trying to gradually port the molecular dynamics code Espresso++ from its 
current pure-MPI form to one that uses HPX for the critical parts of the code. 
It consists of a C++ and MPI-based shared library that can be imported in 
python using the boost.python library, a collection of python modules, and an 
mpi4py-based library for communication among the python processes.

I was able to properly initialize and terminate the HPX runtime environment 
from python using the methods in hpx/examples/quickstart/init_globally.cpp and 
phylanx/python/src/init_hpx.cpp. However, when I use mpi4py to perform 
MPI-based communication from within a python script that also runs HPX, I 
encounter a segmentation fault with the following trace:

---------------------------------
{stack-trace}: 21 frames:
0x2abc616b08f2  : ??? + 0x2abc616b08f2 in 
/lustre/miifs01/project/m2_zdvresearch/vance/hpx/builds/gcc-openmpi-bench/install/lib/libhpx.so.1
0x2abc616ad06c  : hpx::termination_handler(int) + 0x15c in 
/lustre/miifs01/project/m2_zdvresearch/vance/hpx/builds/gcc-openmpi-bench/install/lib/libhpx.so.1
0x2abc5979b370  : ??? + 0x2abc5979b370 in /lib64/libpthread.so.0
0x2abc62755a76  : mca_pml_cm_recv_request_completion + 0xb6 in 
/cluster/easybuild/broadwell/software/mpi/OpenMPI/2.0.2-GCC-6.3.0/lib/libmpi.so.20
0x2abc626f4ac9  : ompi_mtl_psm2_progress + 0x59 in 
/cluster/easybuild/broadwell/software/mpi/OpenMPI/2.0.2-GCC-6.3.0/lib/libmpi.so.20
0x2abc63383eec  : opal_progress + 0x3c in 
/cluster/easybuild/broadwell/software/mpi/OpenMPI/2.0.2-GCC-6.3.0/lib/libopen-pal.so.20
0x2abc62630a75  : ompi_request_default_wait + 0x105 in 
/cluster/easybuild/broadwell/software/mpi/OpenMPI/2.0.2-GCC-6.3.0/lib/libmpi.so.20
0x2abc6267be92  : ompi_coll_base_bcast_intra_generic + 0x5b2 in 
/cluster/easybuild/broadwell/software/mpi/OpenMPI/2.0.2-GCC-6.3.0/lib/libmpi.so.20
0x2abc6267c262  : ompi_coll_base_bcast_intra_binomial + 0xb2 in 
/cluster/easybuild/broadwell/software/mpi/OpenMPI/2.0.2-GCC-6.3.0/lib/libmpi.so.20
0x2abc6268803b  : ompi_coll_tuned_bcast_intra_dec_fixed + 0xcb in 
/cluster/easybuild/broadwell/software/mpi/OpenMPI/2.0.2-GCC-6.3.0/lib/libmpi.so.20
0x2abc62642bc0  : PMPI_Bcast + 0x1a0 in 
/cluster/easybuild/broadwell/software/mpi/OpenMPI/2.0.2-GCC-6.3.0/lib/libmpi.so.20
0x2abc64cea17f  : ??? + 0x2abc64cea17f in 
/cluster/easybuild/broadwell/software/lang/Python/2.7.13-foss-2017a/lib/python2.7/site-packages/mpi4py/MPI.so
0x2abc59176f9b  : PyEval_EvalFrameEx + 0x923b in 
/cluster/easybuild/broadwell/software/lang/Python/2.7.13-foss-2017a/lib/libpython2.7.so.1.0
0x2abc5917879a  : PyEval_EvalCodeEx + 0x87a in 
/cluster/easybuild/broadwell/software/lang/Python/2.7.13-foss-2017a/lib/libpython2.7.so.1.0
0x2abc59178ba9  : PyEval_EvalCode + 0x19 in 
/cluster/easybuild/broadwell/software/lang/Python/2.7.13-foss-2017a/lib/libpython2.7.so.1.0
0x2abc5919cb4a  : PyRun_FileExFlags + 0x8a in 
/cluster/easybuild/broadwell/software/lang/Python/2.7.13-foss-2017a/lib/libpython2.7.so.1.0
0x2abc5919df25  : PyRun_SimpleFileExFlags + 0xd5 in 
/cluster/easybuild/broadwell/software/lang/Python/2.7.13-foss-2017a/lib/libpython2.7.so.1.0
0x2abc591b44e1  : Py_Main + 0xc61 in 
/cluster/easybuild/broadwell/software/lang/Python/2.7.13-foss-2017a/lib/libpython2.7.so.1.0
0x2abc59bccb35  : __libc_start_main + 0xf5 in /lib64/libc.so.6
0x40071e        : ??? + 0x40071e in python
{what}: Segmentation fault
{config}:
  HPX_WITH_AGAS_DUMP_REFCNT_ENTRIES=OFF
  HPX_WITH_APEX=OFF
  HPX_WITH_ATTACH_DEBUGGER_ON_TEST_FAILURE=OFF
  HPX_WITH_AUTOMATIC_SERIALIZATION_REGISTRATION=ON
  HPX_WITH_CXX14_RETURN_TYPE_DEDUCTION=TRUE
  HPX_WITH_DEPRECATION_WARNINGS=ON
  HPX_WITH_GOOGLE_PERFTOOLS=OFF
  HPX_WITH_INCLUSIVE_SCAN_COMPATIBILITY=ON
  HPX_WITH_IO_COUNTERS=ON
  HPX_WITH_IO_POOL=ON
  HPX_WITH_ITTNOTIFY=OFF
  HPX_WITH_LOGGING=ON
  HPX_WITH_MORE_THAN_64_THREADS=OFF
  HPX_WITH_NATIVE_TLS=ON
  HPX_WITH_NETWORKING=ON
  HPX_WITH_PAPI=OFF
  HPX_WITH_PARCELPORT_ACTION_COUNTERS=OFF
  HPX_WITH_PARCELPORT_LIBFABRIC=OFF
  HPX_WITH_PARCELPORT_MPI=ON
  HPX_WITH_PARCELPORT_MPI_MULTITHREADED=ON
  HPX_WITH_PARCELPORT_TCP=ON
  HPX_WITH_PARCELPORT_VERBS=OFF
  HPX_WITH_PARCEL_COALESCING=ON
  HPX_WITH_PARCEL_PROFILING=OFF
  HPX_WITH_SCHEDULER_LOCAL_STORAGE=OFF
  HPX_WITH_SPINLOCK_DEADLOCK_DETECTION=OFF
  HPX_WITH_STACKTRACES=ON
  HPX_WITH_SWAP_CONTEXT_EMULATION=OFF
  HPX_WITH_THREAD_BACKTRACE_ON_SUSPENSION=OFF
  HPX_WITH_THREAD_CREATION_AND_CLEANUP_RATES=OFF
  HPX_WITH_THREAD_CUMULATIVE_COUNTS=ON
  HPX_WITH_THREAD_DEBUG_INFO=OFF
  HPX_WITH_THREAD_DESCRIPTION_FULL=OFF
  HPX_WITH_THREAD_GUARD_PAGE=ON
  HPX_WITH_THREAD_IDLE_RATES=ON
  HPX_WITH_THREAD_LOCAL_STORAGE=OFF
  HPX_WITH_THREAD_MANAGER_IDLE_BACKOFF=ON
  HPX_WITH_THREAD_QUEUE_WAITTIME=OFF
  HPX_WITH_THREAD_STACK_MMAP=ON
  HPX_WITH_THREAD_STEALING_COUNTS=ON
  HPX_WITH_THREAD_TARGET_ADDRESS=OFF
  HPX_WITH_TIMER_POOL=ON
  HPX_WITH_TUPLE_RVALUE_SWAP=ON
  HPX_WITH_UNWRAPPED_COMPATIBILITY=ON
  HPX_WITH_VALGRIND=OFF
  HPX_WITH_VERIFY_LOCKS=OFF
  HPX_WITH_VERIFY_LOCKS_BACKTRACE=OFF
  HPX_WITH_VERIFY_LOCKS_GLOBALLY=OFF

  HPX_PARCEL_MAX_CONNECTIONS=512
  HPX_PARCEL_MAX_CONNECTIONS_PER_LOCALITY=4
  HPX_AGAS_LOCAL_CACHE_SIZE=4096
  HPX_HAVE_MALLOC=JEMALLOC
  HPX_PREFIX 
(configured)=/lustre/miifs01/project/m2_zdvresearch/vance/hpx/builds/gcc-openmpi-bench/install
  
HPX_PREFIX=/lustre/miifs01/project/m2_zdvresearch/vance/hpx/builds/gcc-openmpi-bench/install
{version}: V1.1.0-rc1 (AGAS: V3.0), Git: unknown
{boost}: V1.65.1
{build-type}: release
{date}: Sep 25 2018 11:01:34
{platform}: linux
{compiler}: GNU C++ version 6.3.0
{stdlib}: GNU libstdc++ version 20161221
[login21:18535] *** Process received signal ***
[login21:18535] Signal: Aborted (6)
[login21:18535] Signal code:  (-6)
[login21:18535] [ 0] /lib64/libpthread.so.0(+0xf370)[0x2abc5979b370]
[login21:18535] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x2abc59be01d7]
[login21:18535] [ 2] /lib64/libc.so.6(abort+0x148)[0x2abc59be18c8]
[login21:18535] [ 3] 
/lustre/miifs01/project/m2_zdvresearch/vance/hpx/builds/gcc-openmpi-bench/install/lib/libhpx.so.1(_ZN3hpx19termination_handlerEi+0x213)[0x2abc616ad123]
[login21:18535] [ 4] /lib64/libpthread.so.0(+0xf370)[0x2abc5979b370]
[login21:18535] [ 5] 
/cluster/easybuild/broadwell/software/mpi/OpenMPI/2.0.2-GCC-6.3.0/lib/libmpi.so.20(mca_pml_cm_recv_request_completion+0xb6)[0x2abc62755a76]
[login21:18535] [ 6] 
/cluster/easybuild/broadwell/software/mpi/OpenMPI/2.0.2-GCC-6.3.0/lib/libmpi.so.20(ompi_mtl_psm2_progress+0x59)[0x2abc626f4ac9]
[login21:18535] [ 7] 
/cluster/easybuild/broadwell/software/mpi/OpenMPI/2.0.2-GCC-6.3.0/lib/libopen-pal.so.20(opal_progress+0x3c)[0x2abc63383eec]
[login21:18535] [ 8] 
/cluster/easybuild/broadwell/software/mpi/OpenMPI/2.0.2-GCC-6.3.0/lib/libmpi.so.20(ompi_request_default_wait+0x105)[0x2abc62630a75]
[login21:18535] [ 9] 
/cluster/easybuild/broadwell/software/mpi/OpenMPI/2.0.2-GCC-6.3.0/lib/libmpi.so.20(ompi_coll_base_bcast_intra_generic+0x5b2)[0x2abc6267be92]
[login21:18535] [10] 
/cluster/easybuild/broadwell/software/mpi/OpenMPI/2.0.2-GCC-6.3.0/lib/libmpi.so.20(ompi_coll_base_bcast_intra_binomial+0xb2)[0x2abc6267c262]
[login21:18535] [11] 
/cluster/easybuild/broadwell/software/mpi/OpenMPI/2.0.2-GCC-6.3.0/lib/libmpi.so.20(ompi_coll_tuned_bcast_intra_dec_fixed+0xcb)[0x2abc6268803b]
[login21:18535] [12] 
/cluster/easybuild/broadwell/software/mpi/OpenMPI/2.0.2-GCC-6.3.0/lib/libmpi.so.20(PMPI_Bcast+0x1a0)[0x2abc62642bc0]
[login21:18535] [13] 
/cluster/easybuild/broadwell/software/lang/Python/2.7.13-foss-2017a/lib/python2.7/site-packages/mpi4py/MPI.so(+0xa517f)[0x2abc64cea17f]
[login21:18535] [14] 
/cluster/easybuild/broadwell/software/lang/Python/2.7.13-foss-2017a/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x923b)[0x2abc59176f9b]
[login21:18535] [15] 
/cluster/easybuild/broadwell/software/lang/Python/2.7.13-foss-2017a/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x87a)[0x2abc5917879a]
[login21:18535] [16] 
/cluster/easybuild/broadwell/software/lang/Python/2.7.13-foss-2017a/lib/libpython2.7.so.1.0(PyEval_EvalCode+0x19)[0x2abc59178ba9]
[login21:18535] [17] 
/cluster/easybuild/broadwell/software/lang/Python/2.7.13-foss-2017a/lib/libpython2.7.so.1.0(PyRun_FileExFlags+0x8a)[0x2abc5919cb4a]
[login21:18535] [18] 
/cluster/easybuild/broadwell/software/lang/Python/2.7.13-foss-2017a/lib/libpython2.7.so.1.0(PyRun_SimpleFileExFlags+0xd5)[0x2abc5919df25]
[login21:18535] [19] 
/cluster/easybuild/broadwell/software/lang/Python/2.7.13-foss-2017a/lib/libpython2.7.so.1.0(Py_Main+0xc61)[0x2abc591b44e1]
[login21:18535] [20] /lib64/libc.so.6(__libc_start_main+0xf5)[0x2abc59bccb35]
[login21:18535] [21] python[0x40071e]
[login21:18535] *** End of error message ***
---------------------------------

I think this error is related to 
https://github.com/STEllAR-GROUP/hpx/issues/949 and 
https://github.com/STEllAR-GROUP/hpx/pull/3129  so maybe the suspend and resume 
functions could be used. However, the documentation says this can only be done 
with one locality.

Does anyone know of a way for interprocess communication to still be possible 
within python, separately from the communication layer provided by HPX? Thanks!

Best Regards,

James Vance


_______________________________________________
hpx-users mailing list
hpx-users@stellar.cct.lsu.edu
https://mail.cct.lsu.edu/mailman/listinfo/hpx-users

Reply via email to