Hi Michael, >From your segfault, it seems that you're running into one of the problems in the 0.9.11 release. We believe that those are fixed on the latest master branch. As Hartmut noted, we plan to do a new release soon. Would you mind checking with the latest master branch?
Regards, Thomas Am 08.01.2016 3:48 nachm. schrieb "Michael Levine" <[email protected] >: > Hi, > > > > I've been experimenting with hpx for a hobby project, with a small > virtualized cluster of 3 debian 8.2 machines running on esxi server. When I > run any hpx code on a single locality, it appears to be working; however, > whenever I try to use more than one locality, I invariably get a > segmentation fault, regardless of which code I am using. I first > encountered the trouble with my own code, but it also happens when running > any of the example apps as well. I am somewhat new to all of this and I > cannot figure out how to attach a debugger to try and identify the cause of > these errors. > > > > I'm using hpx 0.9.11 on my small cluster using the latest version of slurm > . (I chose slurm as it appears to provide support for Intel Phi nodes > running applications in native mode). To the best of my knowledge, slurm > is configured correctly. However, it is certainly possible that I have > done something wrong configuring slurm. > > > > I have tried using boost 1.58, 1.59, and 1.60. I have tried with clang 3.7 > and with Intel C++ 16 and 16 update 1. In all cases, I get the same > segmentation fault whenever I try and run on more than a single locality. > I have played around with single vs. multiple network interfaces, single > vs. multiple networks, etc. > > > > Lately, I have re-built boost using the Intel compiler to ensure that > there was no issue caused by hpx and boost having been compiled with > different compilers. I have been trying to troubleshoot this only based > on the example code, rather than my own code, so that I can be confident > that the problems are not caused by my own code errors/bugs. > > > > I know there is a command-line option to attach a debugger but I cannot > figure out how to use this. > > > > I’ve attached a copy of my slurm.conf for reference, and the output of > --hpx:dump-config and --hpx:debug-clp > > > > HPX stack trace / complete error message is copied below. > > > > I’m really stuck here and honestly have no idea how to resolve this > issue. I greatly appreciate any help that you can offer. Furthermore, I’d > really appreciate some guidance as to how to use a debugger to debug my own > hpx code to identify and resolve issues with that code. Please let me know > if there’s any additional information that I should provide. > > > > Thank you very much in advance, > > Shmuel > > > > > > shmuel@ssh01:/usr/local/lib > > > srun -n1 -N1 1d_stencil_7 > > > Localities,OS_Threads,Execution_Time_sec,Points_per_Partition,Partitions,Time_Steps > > 1, 1, 0.093138849, 10, 10, 45 > > > > shmuel@ssh01:~ > > > srun -n2 -N1 1d_stencil_7 > > > > {stack-trace}: 4 frames: > > 0x7f09a45e9840 : hpx::detail::backtrace(unsigned long) + 0x80 in > /usr/local/lib/libhpx.so.0 > > 0x7f09a45eeced : boost::exception_ptr > hpx::detail::get_exception<hpx::exception>(hpx::exception const&, > std::string const&, std::string const&, long, std::string const&) + 0x23d > in /usr/local/lib/libhpx.so.0 > > 0x7f09a45ee8bc : void > hpx::detail::throw_exception<hpx::exception>(hpx::exception const&, > std::string const&, std::string const&, long) + 0x10c in > /usr/local/lib/libhpx.so.0 > > 0x7f09a4a0de3d : > hpx::agas::server::primary_namespace::resolve_free_list(boost::unique_lock<hpx::lcos::local::spinlock>&, > std::list<std::_Rb_tree_iterator<std::pair<hpx::naming::gid_type const, > long> >, > std::allocator<std::_Rb_tree_iterator<std::pair<hpx::naming::gid_type > const, long> > > > const&, > std::list<hpx::agas::server::primary_namespace::free_entry, > std::allocator<hpx::agas::server::primary_namespace::free_entry> >&, > hpx::naming::gid_type const&, hpx::naming::gid_type const&, > hpx::error_code&) + 0x137d in /usr/local/lib/libhpx.so.0 > > {env}: 85 entries: > > ALTERNATE_EDITOR= > > CPATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mkl/include > > CXX=clang++ > > DIRHISTORY_SIZE=30 > > DISPLAY=localhost:10.0 > > EDITOR=/usr/bin/vim > > > FPATH=/home/shmuel/.oh-my-zsh/plugins/wd:/home/shmuel/.oh-my-zsh/plugins/tmux:/home/shmuel/.oh-my-zsh/plugins/dirhistory:/home/shmuel/.oh-my-zsh/plugins/colorize:/home/shmuel/.oh-my-zsh/plugins/history:/home/shmuel/.oh-my-zsh/plugins/sudo:/home/shmuel/.oh-my-zsh/plugins/command-not-found:/home/shmuel/.oh-my-zsh/plugins/tmux:/home/shmuel/.oh-my-zsh/plugins/mosh:/home/shmuel/.oh-my-zsh/plugins/git-extras:/home/shmuel/.oh-my-zsh/plugins/battery:/home/shmuel/.oh-my-zsh/plugins/git-flow-avh:/home/shmuel/.oh-my-zsh/plugins/git:/home/shmuel/.oh-my-zsh/functions:/home/shmuel/.oh-my-zsh/completions:/usr/local/share/zsh/site-functions:/usr/share/zsh/vendor-functions:/usr/share/zsh/vendor-completions:/usr/share/zsh/functions/Calendar:/usr/share/zsh/functions/Chpwd:/usr/share/zsh/functions/Completion:/usr/share/zsh/functions/Completion/AIX:/usr/share/zsh/functions/Completion/BSD:/usr/share/zsh/functions/Completion/Base:/usr/share/zsh/functions/Completion/Cygwin:/usr/share/zsh/functions/Completion/Darwin:/usr/share/zsh/functions/Completion/Debian:/usr/share/zsh/functions/Completion/Linux:/usr/share/zsh/functions/Completion/Mandriva:/usr/share/zsh/functions/Completion/Redhat:/usr/share/zsh/functions/Completion/Solaris:/usr/share/zsh/functions/Completion/Unix:/usr/share/zsh/functions/Completion/X:/usr/share/zsh/functions/Completion/Zsh:/usr/share/zsh/functions/Completion/openSUSE:/usr/share/zsh/functions/Exceptions:/usr/share/zsh/functions/MIME:/usr/share/zsh/functions/Misc:/usr/share/zsh/functions/Newuser:/usr/share/zsh/functions/Prompts:/usr/share/zsh/functions/TCP:/usr/share/zsh/functions/VCS_Info:/usr/share/zsh/functions/VCS_Info/Backends:/usr/share/zsh/functions/Zftp:/usr/share/zsh/functions/Zle:/home/shmuel/bin/funcs > > GDBSERVER_MIC=/opt/intel/debugger_2016/gdb/targets/mic/bin/gdbserver > > GDB_CROSS=/opt/intel/debugger_2016/gdb/intel64_mic/bin/gdb-mic > > HOME=/home/shmuel > > > INFOPATH=/opt/intel/documentation_2016/en/debugger//gdb-ia/info/:/opt/intel/documentation_2016/en/debugger//gdb-mic/info/:/opt/intel/documentation_2016/en/debugger//gdb-igfx/info/ > > > INTEL_LICENSE_FILE=/opt/intel/compilers_and_libraries_2016.1.150/linux/licenses:/opt/intel/licenses:/home/shmuel/intel/licenses > > INTEL_PYTHONHOME=/opt/intel/debugger_2016/python/intel64/ > > I_MPI_ROOT=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi > > LANG=en_US.utf8 > > LANGUAGE=en_CA:en > > LC_ALL=en_CA.UTF-8 > > LC_CTYPE=en_CA.UTF-8 > > > LD_LIBRARY_PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/compiler/lib/intel64:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/tbb/lib/intel64/gcc4.4:/opt/intel/compilers_and_libraries_2016.1.150/linux/compiler/lib/intel64:/opt/intel/compilers_and_libraries_2016.1.150/linux/mkl/lib/intel64:/opt/intel/debugger_2016/libipt/intel64/lib:/home/shmuel/src/fx/lib/: > > LESS=-R > > > LIBRARY_PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/tbb/lib/intel64/gcc4.4:/opt/intel/compilers_and_libraries_2016.1.150/linux/compiler/lib/intel64:/opt/intel/compilers_and_libraries_2016.1.150/linux/mkl/lib/intel64 > > LOGNAME=shmuel > > LSCOLORS=Gxfxcxdxbxegedabagacad > > MAIL=/var/mail/shmuel > > > MANPATH=/opt/intel/man/common:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/man:/opt/intel/compilers_and_libraries_2016.1.150/linux/man/en_US:/opt/intel/documentation_2016/en/debugger//gdb-ia/man/:/opt/intel/documentation_2016/en/debugger//gdb-mic/man/:/opt/intel/documentation_2016/en/debugger//gdb-igfx/man/::/home/shmuel/src/tup/ > > > MIC_LD_LIBRARY_PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/compiler/lib/mic:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/tbb/lib/mic:/opt/intel/compilers_and_libraries_2016.1.150/linux/compiler/lib/mic:/opt/intel/compilers_and_libraries_2016.1.150/linux/mkl/lib/mic > > > MIC_LIBRARY_PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/compiler/lib/mic:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/lib > > MKLROOT=/opt/intel/compilers_and_libraries_2016.1.150/linux/mkl > > MPM_LAUNCHER=/opt/intel/debugger_2016/mpm/mic/bin/start_mpm.sh > > > NLSPATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/compiler/lib/intel64/locale/%l_%t/%N:/opt/intel/compilers_and_libraries_2016.1.150/linux/mkl/lib/intel64/locale/%l_%t/%N:/opt/intel/debugger_2016/gdb/intel64_mic/share/locale/%l_%t/%N:/opt/intel/debugger_2016/gdb/intel64/share/locale/%l_%t/%N > > OLDPWD=/home/shmuel > > PAGER=less > > > PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/bin/intel64:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin:/opt/intel/debugger_2016/gdb/intel64_mic/bin:/usr/local/texlive/2014/bin/x86_64-linux:/usr/local/bin:/usr/bin:/bin:/usr/games:/usr/local/sbin:/sbin:/usr/local/games:/home/shmuel/src/tup/ > > PWD=/home/shmuel > > REPORTTIME=2 > > SHELL=/bin/zsh > > SHLVL=1 > > SLURMD_NODENAME=hpc02 > > SLURM_CHECKPOINT_IMAGE_DIR=/var/lib/slurm-llnl/checkpoint > > SLURM_CLUSTER_NAME=cluster > > SLURM_CPUS_ON_NODE=2 > > SLURM_DISTRIBUTION=block > > SLURM_GTIDS=0,1 > > SLURM_JOBID=2165 > > SLURM_JOB_CPUS_PER_NODE=2 > > SLURM_JOB_ID=2165 > > SLURM_JOB_NAME=1d_stencil_7 > > SLURM_JOB_NODELIST=hpc02 > > SLURM_JOB_NUM_NODES=1 > > SLURM_JOB_PARTITION=debug > > SLURM_JOB_UID=1000 > > SLURM_JOB_USER=shmuel > > SLURM_LAUNCH_NODE_IPADDR=192.168.1.125 > > SLURM_LOCALID=1 > > SLURM_NNODES=1 > > SLURM_NODEID=0 > > SLURM_NODELIST=hpc02 > > SLURM_NPROCS=2 > > SLURM_NTASKS=2 > > SLURM_PRIO_PROCESS=0 > > SLURM_PROCID=1 > > SLURM_SRUN_COMM_HOST=192.168.1.125 > > SLURM_SRUN_COMM_PORT=45712 > > SLURM_STEPID=0 > > SLURM_STEP_ID=0 > > SLURM_STEP_LAUNCHER_PORT=45712 > > SLURM_STEP_NODELIST=hpc02 > > SLURM_STEP_NUM_NODES=1 > > SLURM_STEP_NUM_TASKS=2 > > SLURM_STEP_TASKS_PER_NODE=2 > > SLURM_SUBMIT_DIR=/home/shmuel > > SLURM_SUBMIT_HOST=ssh01.thelevines.ca > > SLURM_TASKS_PER_NODE=2 > > SLURM_TASK_PID=18855 > > SLURM_TOPOLOGY_ADDR=hpc02 > > SLURM_TOPOLOGY_ADDR_PATTERN=node > > SRUN_DEBUG=3 > > SSH_CLIENT=193.90.12.86 38280 22 > > SSH_CONNECTION=193.90.12.86 38280 192.168.1.125 22 > > SSH_TTY=/dev/pts/0 > > TERM=xterm > > USER=shmuel > > ZSH_TMUX_TERM=screen > > _=/usr/local/bin/srun > > > _ZSH_TMUX_FIXED_CONFIG=/home/shmuel/.oh-my-zsh/plugins/tmux/tmux.only.conf > > {locality-id}: 1 > > {hostname}: [ (tcp:192.168.1.72:7911) ] > > {process-id}: 18855 > > {function}: primary_namespace::resolve_free_list > > {file}: /usr/src/hpx/src/runtime/agas/server/primary_namespace_server.cpp > > {line}: 1021 > > {os-thread}: 0, worker-thread#0 > > {thread-id}: 00000000020813c0 > > {thread-description}: <unknown> > > {state}: state_running > > {auxinfo}: > > {config}: > > HPX_HAVE_NATIVE_TLS=ON > > HPX_HAVE_STACKTRACES=ON > > HPX_HAVE_COMPRESSION_BZIP2=OFF > > HPX_HAVE_COMPRESSION_SNAPPY=OFF > > HPX_HAVE_COMPRESSION_ZLIB=OFF > > HPX_HAVE_PARCEL_COALESCING=ON > > HPX_HAVE_PARCELPORT_TCP=ON > > HPX_HAVE_PARCELPORT_MPI=OFF > > HPX_HAVE_PARCELPORT_IPC=OFF > > HPX_HAVE_PARCELPORT_IBVERBS=OFF > > HPX_HAVE_VERIFY_LOCKS=OFF > > HPX_HAVE_HWLOC=ON > > HPX_HAVE_ITTNOTIFY=OFF > > HPX_HAVE_RUN_MAIN_EVERYWHERE=OFF > > HPX_LIMIT=5 > > HPX_PARCEL_MAX_CONNECTIONS=512 > > HPX_PARCEL_MAX_CONNECTIONS_PER_LOCALITY=4 > > HPX_INITIAL_AGAS_LOCAL_CACHE_SIZE=256 > > HPX_AGAS_LOCAL_CACHE_SIZE_PER_THREAD=32 > > HPX_HAVE_MALLOC=tcmalloc > > HPX_PREFIX (configured)=/usr/local > > HPX_PREFIX=/usr/local > > {version}: V0.9.11 (AGAS: V3.0), Git: 4c96a9b3b3 > > {boost}: V1.60.0 > > {build-type}: release > > {date}: Jan 3 2016 23:53:54 > > {platform}: linux > > {compiler}: Intel C++ C++0x mode version 1600 > > {stdlib}: GNU libstdc++ version 20141220 > > {what}: primary_namespace::resolve_free_list, failed to resolve gid, > gid({0000000200000001, 0000000000001002}): HPX(internal_server_error) > > > > {stack-trace}: 2 frames: > > 0x7f09a4670d79 : hpx::termination_handler(int) + 0x159 in > /usr/local/lib/libhpx.so.0 > > 0x7f09a11e78d0 : ??? + 0x7f09a11e78d0 in > /lib/x86_64-linux-gnu/libpthread.so.0 > > {what}: Segmentation fault > > {config}: > > HPX_HAVE_NATIVE_TLS=ON > > HPX_HAVE_STACKTRACES=ON > > HPX_HAVE_COMPRESSION_BZIP2=OFF > > HPX_HAVE_COMPRESSION_SNAPPY=OFF > > HPX_HAVE_COMPRESSION_ZLIB=OFF > > HPX_HAVE_PARCEL_COALESCING=ON > > HPX_HAVE_PARCELPORT_TCP=ON > > HPX_HAVE_PARCELPORT_MPI=OFF > > HPX_HAVE_PARCELPORT_IPC=OFF > > HPX_HAVE_PARCELPORT_IBVERBS=OFF > > HPX_HAVE_VERIFY_LOCKS=OFF > > HPX_HAVE_HWLOC=ON > > HPX_HAVE_ITTNOTIFY=OFF > > HPX_HAVE_RUN_MAIN_EVERYWHERE=OFF > > HPX_LIMIT=5 > > HPX_PARCEL_MAX_CONNECTIONS=512 > > HPX_PARCEL_MAX_CONNECTIONS_PER_LOCALITY=4 > > HPX_INITIAL_AGAS_LOCAL_CACHE_SIZE=256 > > HPX_AGAS_LOCAL_CACHE_SIZE_PER_THREAD=32 > > HPX_HAVE_MALLOC=tcmalloc > > HPX_PREFIX (configured)=/usr/local > > HPX_PREFIX=/usr/local > > {version}: V0.9.11 (AGAS: V3.0), Git: 4c96a9b3b3 > > {boost}: V1.60.0 > > {build-type}: release > > {date}: Jan 3 2016 23:53:54 > > {platform}: linux > > {compiler}: Intel C++ C++0x mode version 1600 > > {stdlib}: GNU libstdc++ version 20141220 > > srun: error: hpc02: task 1: Aborted > > > > _______________________________________________ > hpx-users mailing list > [email protected] > https://mail.cct.lsu.edu/mailman/listinfo/hpx-users > >
_______________________________________________ hpx-users mailing list [email protected] https://mail.cct.lsu.edu/mailman/listinfo/hpx-users
