Michael, > Thanks - I'll give the master branch a try. It a little ironic -- I > specifically used a release branch to avoid possible bugs in the master > branch. I seem to recall a discussion on the mailing list where it was > recommended to use 0.9.10 instead of master.
Yes, the 0.9.10 release turned out to be fairly bug-free and stable - a feat we didn't manage to achieve with 0.9.11 :/ Regards Hartmut --------------- http://boost-spirit.com http://stellar.cct.lsu.edu > > Sent from my BlackBerry 10 smartphone on the Rogers network. > From: Thomas Heller > Sent: Friday, January 8, 2016 11:49 AM > To: [email protected] > Reply To: [email protected] > Subject: Re: [hpx-users] hpx 0.9.11 segmentation fault running on multiple > localities > > Hi Michael, > From your segfault, it seems that you're running into one of the problems > in the 0.9.11 release. We believe that those are fixed on the latest > master branch. As Hartmut noted, we plan to do a new release soon. Would > you mind checking with the latest master branch? > Regards, > Thomas > Am 08.01.2016 3:48 nachm. schrieb "Michael Levine" > <[email protected]>: > Hi, > > I've been experimenting with hpx for a hobby project, with a small > virtualized cluster of 3 debian 8.2 machines running on esxi server. When > I run any hpx code on a single locality, it appears to be working; > however, whenever I try to use more than one locality, I invariably get a > segmentation fault, regardless of which code I am using. I first > encountered the trouble with my own code, but it also happens when running > any of the example apps as well. I am somewhat new to all of this and I > cannot figure out how to attach a debugger to try and identify the cause > of these errors. > > I'm using hpx 0.9.11 on my small cluster using the latest version of slurm > . (I chose slurm as it appears to provide support for Intel Phi nodes > running applications in native mode). To the best of my knowledge, slurm > is configured correctly. However, it is certainly possible that I have > done something wrong configuring slurm. > > I have tried using boost 1.58, 1.59, and 1.60. I have tried with clang 3.7 > and with Intel C++ 16 and 16 update 1. In all cases, I get the same > segmentation fault whenever I try and run on more than a single > locality. I have played around with single vs. multiple network > interfaces, single vs. multiple networks, etc. > > Lately, I have re-built boost using the Intel compiler to ensure that > there was no issue caused by hpx and boost having been compiled with > different compilers. I have been trying to troubleshoot this only based > on the example code, rather than my own code, so that I can be confident > that the problems are not caused by my own code errors/bugs. > > I know there is a command-line option to attach a debugger but I cannot > figure out how to use this. > > I’ve attached a copy of my slurm.conf for reference, and the output of -- > hpx:dump-config and --hpx:debug-clp > > HPX stack trace / complete error message is copied below. > > I’m really stuck here and honestly have no idea how to resolve this > issue. I greatly appreciate any help that you can offer. Furthermore, > I’d really appreciate some guidance as to how to use a debugger to debug > my own hpx code to identify and resolve issues with that code. Please let > me know if there’s any additional information that I should provide. > > Thank you very much in advance, > Shmuel > > > shmuel@ssh01:/usr/local/lib > > srun -n1 -N1 1d_stencil_7 > Localities,OS_Threads,Execution_Time_sec,Points_per_Partition,Partitions,T > ime_Steps > 1, 1, 0.093138849, 10, 10, 45 > > shmuel@ssh01:~ > > srun -n2 -N1 1d_stencil_7 > > {stack-trace}: 4 frames: > 0x7f09a45e9840 : hpx::detail::backtrace(unsigned long) + 0x80 in > /usr/local/lib/libhpx.so.0 > 0x7f09a45eeced : boost::exception_ptr > hpx::detail::get_exception<hpx::exception>(hpx::exception const&, > std::string const&, std::string const&, long, std::string const&) + 0x23d > in /usr/local/lib/libhpx.so.0 > 0x7f09a45ee8bc : void > hpx::detail::throw_exception<hpx::exception>(hpx::exception const&, > std::string const&, std::string const&, long) + 0x10c in > /usr/local/lib/libhpx.so.0 > 0x7f09a4a0de3d : > hpx::agas::server::primary_namespace::resolve_free_list(boost::unique_lock > <hpx::lcos::local::spinlock>&, > std::list<std::_Rb_tree_iterator<std::pair<hpx::naming::gid_type const, > long> >, > std::allocator<std::_Rb_tree_iterator<std::pair<hpx::naming::gid_type > const, long> > > > const&, > std::list<hpx::agas::server::primary_namespace::free_entry, > std::allocator<hpx::agas::server::primary_namespace::free_entry> >&, > hpx::naming::gid_type const&, hpx::naming::gid_type const&, > hpx::error_code&) + 0x137d in /usr/local/lib/libhpx.so.0 > {env}: 85 entries: > ALTERNATE_EDITOR= > CPATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mkl/include > CXX=clang++ > DIRHISTORY_SIZE=30 > DISPLAY=localhost:10.0 > EDITOR=/usr/bin/vim > FPATH=/home/shmuel/.oh-my-zsh/plugins/wd:/home/shmuel/.oh-my- > zsh/plugins/tmux:/home/shmuel/.oh-my- > zsh/plugins/dirhistory:/home/shmuel/.oh-my- > zsh/plugins/colorize:/home/shmuel/.oh-my- > zsh/plugins/history:/home/shmuel/.oh-my-zsh/plugins/sudo:/home/shmuel/.oh- > my-zsh/plugins/command-not-found:/home/shmuel/.oh-my- > zsh/plugins/tmux:/home/shmuel/.oh-my-zsh/plugins/mosh:/home/shmuel/.oh-my- > zsh/plugins/git-extras:/home/shmuel/.oh-my- > zsh/plugins/battery:/home/shmuel/.oh-my-zsh/plugins/git-flow- > avh:/home/shmuel/.oh-my-zsh/plugins/git:/home/shmuel/.oh-my- > zsh/functions:/home/shmuel/.oh-my- > zsh/completions:/usr/local/share/zsh/site-functions:/usr/share/zsh/vendor- > functions:/usr/share/zsh/vendor- > completions:/usr/share/zsh/functions/Calendar:/usr/share/zsh/functions/Chp > wd:/usr/share/zsh/functions/Completion:/usr/share/zsh/functions/Completion > /AIX:/usr/share/zsh/functions/Completion/BSD:/usr/share/zsh/functions/Comp > letion/Base:/usr/share/zsh/functions/Completion/Cygwin:/usr/share/zsh/func > tions/Completion/Darwin:/usr/share/zsh/functions/Completion/Debian:/usr/sh > are/zsh/functions/Completion/Linux:/usr/share/zsh/functions/Completion/Man > driva:/usr/share/zsh/functions/Completion/Redhat:/usr/share/zsh/functions/ > Completion/Solaris:/usr/share/zsh/functions/Completion/Unix:/usr/share/zsh > /functions/Completion/X:/usr/share/zsh/functions/Completion/Zsh:/usr/share > /zsh/functions/Completion/openSUSE:/usr/share/zsh/functions/Exceptions:/us > r/share/zsh/functions/MIME:/usr/share/zsh/functions/Misc:/usr/share/zsh/fu > nctions/Newuser:/usr/share/zsh/functions/Prompts:/usr/share/zsh/functions/ > TCP:/usr/share/zsh/functions/VCS_Info:/usr/share/zsh/functions/VCS_Info/Ba > ckends:/usr/share/zsh/functions/Zftp:/usr/share/zsh/functions/Zle:/home/sh > muel/bin/funcs > GDBSERVER_MIC=/opt/intel/debugger_2016/gdb/targets/mic/bin/gdbserver > GDB_CROSS=/opt/intel/debugger_2016/gdb/intel64_mic/bin/gdb-mic > HOME=/home/shmuel > INFOPATH=/opt/intel/documentation_2016/en/debugger//gdb- > ia/info/:/opt/intel/documentation_2016/en/debugger//gdb- > mic/info/:/opt/intel/documentation_2016/en/debugger//gdb-igfx/info/ > INTEL_LICENSE_FILE=/opt/intel/compilers_and_libraries_2016.1.150/linux/l > icenses:/opt/intel/licenses:/home/shmuel/intel/licenses > INTEL_PYTHONHOME=/opt/intel/debugger_2016/python/intel64/ > I_MPI_ROOT=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi > LANG=en_US.utf8 > LANGUAGE=en_CA:en > LC_ALL=en_CA.UTF-8 > LC_CTYPE=en_CA.UTF-8 > LD_LIBRARY_PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/comp > iler/lib/intel64:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/i > ntel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/lib > :/opt/intel/compilers_and_libraries_2016.1.150/linux/tbb/lib/intel64/gcc4. > 4:/opt/intel/compilers_and_libraries_2016.1.150/linux/compiler/lib/intel64 > :/opt/intel/compilers_and_libraries_2016.1.150/linux/mkl/lib/intel64:/opt/ > intel/debugger_2016/libipt/intel64/lib:/home/shmuel/src/fx/lib/: > LESS=-R > LIBRARY_PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/tbb/lib > /intel64/gcc4.4:/opt/intel/compilers_and_libraries_2016.1.150/linux/compil > er/lib/intel64:/opt/intel/compilers_and_libraries_2016.1.150/linux/mkl/lib > /intel64 > LOGNAME=shmuel > LSCOLORS=Gxfxcxdxbxegedabagacad > MAIL=/var/mail/shmuel > MANPATH=/opt/intel/man/common:/opt/intel/compilers_and_libraries_2016.1. > 150/linux/mpi/man:/opt/intel/compilers_and_libraries_2016.1.150/linux/man/ > en_US:/opt/intel/documentation_2016/en/debugger//gdb- > ia/man/:/opt/intel/documentation_2016/en/debugger//gdb- > mic/man/:/opt/intel/documentation_2016/en/debugger//gdb- > igfx/man/::/home/shmuel/src/tup/ > MIC_LD_LIBRARY_PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/ > compiler/lib/mic:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/m > ic/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/tbb/lib/mic:/op > t/intel/compilers_and_libraries_2016.1.150/linux/compiler/lib/mic:/opt/int > el/compilers_and_libraries_2016.1.150/linux/mkl/lib/mic > MIC_LIBRARY_PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/com > piler/lib/mic:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/ > lib > MKLROOT=/opt/intel/compilers_and_libraries_2016.1.150/linux/mkl > MPM_LAUNCHER=/opt/intel/debugger_2016/mpm/mic/bin/start_mpm.sh > NLSPATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/compiler/lib > /intel64/locale/%l_%t/%N:/opt/intel/compilers_and_libraries_2016.1.150/lin > ux/mkl/lib/intel64/locale/%l_%t/%N:/opt/intel/debugger_2016/gdb/intel64_mi > c/share/locale/%l_%t/%N:/opt/intel/debugger_2016/gdb/intel64/share/locale/ > %l_%t/%N > OLDPWD=/home/shmuel > PAGER=less > PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/bin/intel64:/op > t/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin:/opt/inte > l/debugger_2016/gdb/intel64_mic/bin:/usr/local/texlive/2014/bin/x86_64- > linux:/usr/local/bin:/usr/bin:/bin:/usr/games:/usr/local/sbin:/sbin:/usr/l > ocal/games:/home/shmuel/src/tup/ > PWD=/home/shmuel > REPORTTIME=2 > SHELL=/bin/zsh > SHLVL=1 > SLURMD_NODENAME=hpc02 > SLURM_CHECKPOINT_IMAGE_DIR=/var/lib/slurm-llnl/checkpoint > SLURM_CLUSTER_NAME=cluster > SLURM_CPUS_ON_NODE=2 > SLURM_DISTRIBUTION=block > SLURM_GTIDS=0,1 > SLURM_JOBID=2165 > SLURM_JOB_CPUS_PER_NODE=2 > SLURM_JOB_ID=2165 > SLURM_JOB_NAME=1d_stencil_7 > SLURM_JOB_NODELIST=hpc02 > SLURM_JOB_NUM_NODES=1 > SLURM_JOB_PARTITION=debug > SLURM_JOB_UID=1000 > SLURM_JOB_USER=shmuel > SLURM_LAUNCH_NODE_IPADDR=192.168.1.125 > SLURM_LOCALID=1 > SLURM_NNODES=1 > SLURM_NODEID=0 > SLURM_NODELIST=hpc02 > SLURM_NPROCS=2 > SLURM_NTASKS=2 > SLURM_PRIO_PROCESS=0 > SLURM_PROCID=1 > SLURM_SRUN_COMM_HOST=192.168.1.125 > SLURM_SRUN_COMM_PORT=45712 > SLURM_STEPID=0 > SLURM_STEP_ID=0 > SLURM_STEP_LAUNCHER_PORT=45712 > SLURM_STEP_NODELIST=hpc02 > SLURM_STEP_NUM_NODES=1 > SLURM_STEP_NUM_TASKS=2 > SLURM_STEP_TASKS_PER_NODE=2 > SLURM_SUBMIT_DIR=/home/shmuel > SLURM_SUBMIT_HOST=ssh01.thelevines.ca > SLURM_TASKS_PER_NODE=2 > SLURM_TASK_PID=18855 > SLURM_TOPOLOGY_ADDR=hpc02 > SLURM_TOPOLOGY_ADDR_PATTERN=node > SRUN_DEBUG=3 > SSH_CLIENT=193.90.12.86 38280 22 > SSH_CONNECTION=193.90.12.86 38280 192.168.1.125 22 > SSH_TTY=/dev/pts/0 > TERM=xterm > USER=shmuel > ZSH_TMUX_TERM=screen > _=/usr/local/bin/srun > _ZSH_TMUX_FIXED_CONFIG=/home/shmuel/.oh-my- > zsh/plugins/tmux/tmux.only.conf > {locality-id}: 1 > {hostname}: [ (tcp:192.168.1.72:7911) ] > {process-id}: 18855 > {function}: primary_namespace::resolve_free_list > {file}: /usr/src/hpx/src/runtime/agas/server/primary_namespace_server.cpp > {line}: 1021 > {os-thread}: 0, worker-thread#0 > {thread-id}: 00000000020813c0 > {thread-description}: <unknown> > {state}: state_running > {auxinfo}: > {config}: > HPX_HAVE_NATIVE_TLS=ON > HPX_HAVE_STACKTRACES=ON > HPX_HAVE_COMPRESSION_BZIP2=OFF > HPX_HAVE_COMPRESSION_SNAPPY=OFF > HPX_HAVE_COMPRESSION_ZLIB=OFF > HPX_HAVE_PARCEL_COALESCING=ON > HPX_HAVE_PARCELPORT_TCP=ON > HPX_HAVE_PARCELPORT_MPI=OFF > HPX_HAVE_PARCELPORT_IPC=OFF > HPX_HAVE_PARCELPORT_IBVERBS=OFF > HPX_HAVE_VERIFY_LOCKS=OFF > HPX_HAVE_HWLOC=ON > HPX_HAVE_ITTNOTIFY=OFF > HPX_HAVE_RUN_MAIN_EVERYWHERE=OFF > HPX_LIMIT=5 > HPX_PARCEL_MAX_CONNECTIONS=512 > HPX_PARCEL_MAX_CONNECTIONS_PER_LOCALITY=4 > HPX_INITIAL_AGAS_LOCAL_CACHE_SIZE=256 > HPX_AGAS_LOCAL_CACHE_SIZE_PER_THREAD=32 > HPX_HAVE_MALLOC=tcmalloc > HPX_PREFIX (configured)=/usr/local > HPX_PREFIX=/usr/local > {version}: V0.9.11 (AGAS: V3.0), Git: 4c96a9b3b3 > {boost}: V1.60.0 > {build-type}: release > {date}: Jan 3 2016 23:53:54 > {platform}: linux > {compiler}: Intel C++ C++0x mode version 1600 > {stdlib}: GNU libstdc++ version 20141220 > {what}: primary_namespace::resolve_free_list, failed to resolve gid, > gid({0000000200000001, 0000000000001002}): HPX(internal_server_error) > > {stack-trace}: 2 frames: > 0x7f09a4670d79 : hpx::termination_handler(int) + 0x159 in > /usr/local/lib/libhpx.so.0 > 0x7f09a11e78d0 : ??? + 0x7f09a11e78d0 in /lib/x86_64-linux- > gnu/libpthread.so.0 > {what}: Segmentation fault > {config}: > HPX_HAVE_NATIVE_TLS=ON > HPX_HAVE_STACKTRACES=ON > HPX_HAVE_COMPRESSION_BZIP2=OFF > HPX_HAVE_COMPRESSION_SNAPPY=OFF > HPX_HAVE_COMPRESSION_ZLIB=OFF > HPX_HAVE_PARCEL_COALESCING=ON > HPX_HAVE_PARCELPORT_TCP=ON > HPX_HAVE_PARCELPORT_MPI=OFF > HPX_HAVE_PARCELPORT_IPC=OFF > HPX_HAVE_PARCELPORT_IBVERBS=OFF > HPX_HAVE_VERIFY_LOCKS=OFF > HPX_HAVE_HWLOC=ON > HPX_HAVE_ITTNOTIFY=OFF > HPX_HAVE_RUN_MAIN_EVERYWHERE=OFF > HPX_LIMIT=5 > HPX_PARCEL_MAX_CONNECTIONS=512 > HPX_PARCEL_MAX_CONNECTIONS_PER_LOCALITY=4 > HPX_INITIAL_AGAS_LOCAL_CACHE_SIZE=256 > HPX_AGAS_LOCAL_CACHE_SIZE_PER_THREAD=32 > HPX_HAVE_MALLOC=tcmalloc > HPX_PREFIX (configured)=/usr/local > HPX_PREFIX=/usr/local > {version}: V0.9.11 (AGAS: V3.0), Git: 4c96a9b3b3 > {boost}: V1.60.0 > {build-type}: release > {date}: Jan 3 2016 23:53:54 > {platform}: linux > {compiler}: Intel C++ C++0x mode version 1600 > {stdlib}: GNU libstdc++ version 20141220 > srun: error: hpc02: task 1: Aborted > > > _______________________________________________ > hpx-users mailing list > [email protected] > https://mail.cct.lsu.edu/mailman/listinfo/hpx-users > _______________________________________________ hpx-users mailing list [email protected] https://mail.cct.lsu.edu/mailman/listinfo/hpx-users
