Hi everyone, I am trying to gradually port the molecular dynamics code Espresso++ from its current pure-MPI form to one that uses HPX for the critical parts of the code. It consists of a C++ and MPI-based shared library that can be imported in python using the boost.python library, a collection of python modules, and an mpi4py-based library for communication among the python processes.
I was able to properly initialize and terminate the HPX runtime environment from python using the methods in hpx/examples/quickstart/init_globally.cpp and phylanx/python/src/init_hpx.cpp. However, when I use mpi4py to perform MPI-based communication from within a python script that also runs HPX, I encounter a segmentation fault with the following trace: --------------------------------- {stack-trace}: 21 frames: 0x2abc616b08f2 : ??? + 0x2abc616b08f2 in /lustre/miifs01/project/m2_zdvresearch/vance/hpx/builds/gcc-openmpi-bench/install/lib/libhpx.so.1 0x2abc616ad06c : hpx::termination_handler(int) + 0x15c in /lustre/miifs01/project/m2_zdvresearch/vance/hpx/builds/gcc-openmpi-bench/install/lib/libhpx.so.1 0x2abc5979b370 : ??? + 0x2abc5979b370 in /lib64/libpthread.so.0 0x2abc62755a76 : mca_pml_cm_recv_request_completion + 0xb6 in /cluster/easybuild/broadwell/software/mpi/OpenMPI/2.0.2-GCC-6.3.0/lib/libmpi.so.20 0x2abc626f4ac9 : ompi_mtl_psm2_progress + 0x59 in /cluster/easybuild/broadwell/software/mpi/OpenMPI/2.0.2-GCC-6.3.0/lib/libmpi.so.20 0x2abc63383eec : opal_progress + 0x3c in /cluster/easybuild/broadwell/software/mpi/OpenMPI/2.0.2-GCC-6.3.0/lib/libopen-pal.so.20 0x2abc62630a75 : ompi_request_default_wait + 0x105 in /cluster/easybuild/broadwell/software/mpi/OpenMPI/2.0.2-GCC-6.3.0/lib/libmpi.so.20 0x2abc6267be92 : ompi_coll_base_bcast_intra_generic + 0x5b2 in /cluster/easybuild/broadwell/software/mpi/OpenMPI/2.0.2-GCC-6.3.0/lib/libmpi.so.20 0x2abc6267c262 : ompi_coll_base_bcast_intra_binomial + 0xb2 in /cluster/easybuild/broadwell/software/mpi/OpenMPI/2.0.2-GCC-6.3.0/lib/libmpi.so.20 0x2abc6268803b : ompi_coll_tuned_bcast_intra_dec_fixed + 0xcb in /cluster/easybuild/broadwell/software/mpi/OpenMPI/2.0.2-GCC-6.3.0/lib/libmpi.so.20 0x2abc62642bc0 : PMPI_Bcast + 0x1a0 in /cluster/easybuild/broadwell/software/mpi/OpenMPI/2.0.2-GCC-6.3.0/lib/libmpi.so.20 0x2abc64cea17f : ??? + 0x2abc64cea17f in /cluster/easybuild/broadwell/software/lang/Python/2.7.13-foss-2017a/lib/python2.7/site-packages/mpi4py/MPI.so 0x2abc59176f9b : PyEval_EvalFrameEx + 0x923b in /cluster/easybuild/broadwell/software/lang/Python/2.7.13-foss-2017a/lib/libpython2.7.so.1.0 0x2abc5917879a : PyEval_EvalCodeEx + 0x87a in /cluster/easybuild/broadwell/software/lang/Python/2.7.13-foss-2017a/lib/libpython2.7.so.1.0 0x2abc59178ba9 : PyEval_EvalCode + 0x19 in /cluster/easybuild/broadwell/software/lang/Python/2.7.13-foss-2017a/lib/libpython2.7.so.1.0 0x2abc5919cb4a : PyRun_FileExFlags + 0x8a in /cluster/easybuild/broadwell/software/lang/Python/2.7.13-foss-2017a/lib/libpython2.7.so.1.0 0x2abc5919df25 : PyRun_SimpleFileExFlags + 0xd5 in /cluster/easybuild/broadwell/software/lang/Python/2.7.13-foss-2017a/lib/libpython2.7.so.1.0 0x2abc591b44e1 : Py_Main + 0xc61 in /cluster/easybuild/broadwell/software/lang/Python/2.7.13-foss-2017a/lib/libpython2.7.so.1.0 0x2abc59bccb35 : __libc_start_main + 0xf5 in /lib64/libc.so.6 0x40071e : ??? + 0x40071e in python {what}: Segmentation fault {config}: HPX_WITH_AGAS_DUMP_REFCNT_ENTRIES=OFF HPX_WITH_APEX=OFF HPX_WITH_ATTACH_DEBUGGER_ON_TEST_FAILURE=OFF HPX_WITH_AUTOMATIC_SERIALIZATION_REGISTRATION=ON HPX_WITH_CXX14_RETURN_TYPE_DEDUCTION=TRUE HPX_WITH_DEPRECATION_WARNINGS=ON HPX_WITH_GOOGLE_PERFTOOLS=OFF HPX_WITH_INCLUSIVE_SCAN_COMPATIBILITY=ON HPX_WITH_IO_COUNTERS=ON HPX_WITH_IO_POOL=ON HPX_WITH_ITTNOTIFY=OFF HPX_WITH_LOGGING=ON HPX_WITH_MORE_THAN_64_THREADS=OFF HPX_WITH_NATIVE_TLS=ON HPX_WITH_NETWORKING=ON HPX_WITH_PAPI=OFF HPX_WITH_PARCELPORT_ACTION_COUNTERS=OFF HPX_WITH_PARCELPORT_LIBFABRIC=OFF HPX_WITH_PARCELPORT_MPI=ON HPX_WITH_PARCELPORT_MPI_MULTITHREADED=ON HPX_WITH_PARCELPORT_TCP=ON HPX_WITH_PARCELPORT_VERBS=OFF HPX_WITH_PARCEL_COALESCING=ON HPX_WITH_PARCEL_PROFILING=OFF HPX_WITH_SCHEDULER_LOCAL_STORAGE=OFF HPX_WITH_SPINLOCK_DEADLOCK_DETECTION=OFF HPX_WITH_STACKTRACES=ON HPX_WITH_SWAP_CONTEXT_EMULATION=OFF HPX_WITH_THREAD_BACKTRACE_ON_SUSPENSION=OFF HPX_WITH_THREAD_CREATION_AND_CLEANUP_RATES=OFF HPX_WITH_THREAD_CUMULATIVE_COUNTS=ON HPX_WITH_THREAD_DEBUG_INFO=OFF HPX_WITH_THREAD_DESCRIPTION_FULL=OFF HPX_WITH_THREAD_GUARD_PAGE=ON HPX_WITH_THREAD_IDLE_RATES=ON HPX_WITH_THREAD_LOCAL_STORAGE=OFF HPX_WITH_THREAD_MANAGER_IDLE_BACKOFF=ON HPX_WITH_THREAD_QUEUE_WAITTIME=OFF HPX_WITH_THREAD_STACK_MMAP=ON HPX_WITH_THREAD_STEALING_COUNTS=ON HPX_WITH_THREAD_TARGET_ADDRESS=OFF HPX_WITH_TIMER_POOL=ON HPX_WITH_TUPLE_RVALUE_SWAP=ON HPX_WITH_UNWRAPPED_COMPATIBILITY=ON HPX_WITH_VALGRIND=OFF HPX_WITH_VERIFY_LOCKS=OFF HPX_WITH_VERIFY_LOCKS_BACKTRACE=OFF HPX_WITH_VERIFY_LOCKS_GLOBALLY=OFF HPX_PARCEL_MAX_CONNECTIONS=512 HPX_PARCEL_MAX_CONNECTIONS_PER_LOCALITY=4 HPX_AGAS_LOCAL_CACHE_SIZE=4096 HPX_HAVE_MALLOC=JEMALLOC HPX_PREFIX (configured)=/lustre/miifs01/project/m2_zdvresearch/vance/hpx/builds/gcc-openmpi-bench/install HPX_PREFIX=/lustre/miifs01/project/m2_zdvresearch/vance/hpx/builds/gcc-openmpi-bench/install {version}: V1.1.0-rc1 (AGAS: V3.0), Git: unknown {boost}: V1.65.1 {build-type}: release {date}: Sep 25 2018 11:01:34 {platform}: linux {compiler}: GNU C++ version 6.3.0 {stdlib}: GNU libstdc++ version 20161221 [login21:18535] *** Process received signal *** [login21:18535] Signal: Aborted (6) [login21:18535] Signal code: (-6) [login21:18535] [ 0] /lib64/libpthread.so.0(+0xf370)[0x2abc5979b370] [login21:18535] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x2abc59be01d7] [login21:18535] [ 2] /lib64/libc.so.6(abort+0x148)[0x2abc59be18c8] [login21:18535] [ 3] /lustre/miifs01/project/m2_zdvresearch/vance/hpx/builds/gcc-openmpi-bench/install/lib/libhpx.so.1(_ZN3hpx19termination_handlerEi+0x213)[0x2abc616ad123] [login21:18535] [ 4] /lib64/libpthread.so.0(+0xf370)[0x2abc5979b370] [login21:18535] [ 5] /cluster/easybuild/broadwell/software/mpi/OpenMPI/2.0.2-GCC-6.3.0/lib/libmpi.so.20(mca_pml_cm_recv_request_completion+0xb6)[0x2abc62755a76] [login21:18535] [ 6] /cluster/easybuild/broadwell/software/mpi/OpenMPI/2.0.2-GCC-6.3.0/lib/libmpi.so.20(ompi_mtl_psm2_progress+0x59)[0x2abc626f4ac9] [login21:18535] [ 7] /cluster/easybuild/broadwell/software/mpi/OpenMPI/2.0.2-GCC-6.3.0/lib/libopen-pal.so.20(opal_progress+0x3c)[0x2abc63383eec] [login21:18535] [ 8] /cluster/easybuild/broadwell/software/mpi/OpenMPI/2.0.2-GCC-6.3.0/lib/libmpi.so.20(ompi_request_default_wait+0x105)[0x2abc62630a75] [login21:18535] [ 9] /cluster/easybuild/broadwell/software/mpi/OpenMPI/2.0.2-GCC-6.3.0/lib/libmpi.so.20(ompi_coll_base_bcast_intra_generic+0x5b2)[0x2abc6267be92] [login21:18535] [10] /cluster/easybuild/broadwell/software/mpi/OpenMPI/2.0.2-GCC-6.3.0/lib/libmpi.so.20(ompi_coll_base_bcast_intra_binomial+0xb2)[0x2abc6267c262] [login21:18535] [11] /cluster/easybuild/broadwell/software/mpi/OpenMPI/2.0.2-GCC-6.3.0/lib/libmpi.so.20(ompi_coll_tuned_bcast_intra_dec_fixed+0xcb)[0x2abc6268803b] [login21:18535] [12] /cluster/easybuild/broadwell/software/mpi/OpenMPI/2.0.2-GCC-6.3.0/lib/libmpi.so.20(PMPI_Bcast+0x1a0)[0x2abc62642bc0] [login21:18535] [13] /cluster/easybuild/broadwell/software/lang/Python/2.7.13-foss-2017a/lib/python2.7/site-packages/mpi4py/MPI.so(+0xa517f)[0x2abc64cea17f] [login21:18535] [14] /cluster/easybuild/broadwell/software/lang/Python/2.7.13-foss-2017a/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x923b)[0x2abc59176f9b] [login21:18535] [15] /cluster/easybuild/broadwell/software/lang/Python/2.7.13-foss-2017a/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x87a)[0x2abc5917879a] [login21:18535] [16] /cluster/easybuild/broadwell/software/lang/Python/2.7.13-foss-2017a/lib/libpython2.7.so.1.0(PyEval_EvalCode+0x19)[0x2abc59178ba9] [login21:18535] [17] /cluster/easybuild/broadwell/software/lang/Python/2.7.13-foss-2017a/lib/libpython2.7.so.1.0(PyRun_FileExFlags+0x8a)[0x2abc5919cb4a] [login21:18535] [18] /cluster/easybuild/broadwell/software/lang/Python/2.7.13-foss-2017a/lib/libpython2.7.so.1.0(PyRun_SimpleFileExFlags+0xd5)[0x2abc5919df25] [login21:18535] [19] /cluster/easybuild/broadwell/software/lang/Python/2.7.13-foss-2017a/lib/libpython2.7.so.1.0(Py_Main+0xc61)[0x2abc591b44e1] [login21:18535] [20] /lib64/libc.so.6(__libc_start_main+0xf5)[0x2abc59bccb35] [login21:18535] [21] python[0x40071e] [login21:18535] *** End of error message *** --------------------------------- I think this error is related to https://github.com/STEllAR-GROUP/hpx/issues/949 and https://github.com/STEllAR-GROUP/hpx/pull/3129 so maybe the suspend and resume functions could be used. However, the documentation says this can only be done with one locality. Does anyone know of a way for interprocess communication to still be possible within python, separately from the communication layer provided by HPX? Thanks! Best Regards, James Vance
_______________________________________________ hpx-users mailing list hpx-users@stellar.cct.lsu.edu https://mail.cct.lsu.edu/mailman/listinfo/hpx-users