Michael,

> Thanks - I'll give the master branch a try. It a little ironic -- I
> specifically used a release branch to avoid possible bugs in the master
> branch. I seem to recall a discussion on the mailing list where it was
> recommended to use ‎0.9.10 instead of master.

Yes, the 0.9.10 release turned out to be fairly bug-free and stable - a feat we 
didn't manage to achieve with 0.9.11 :/

Regards Hartmut
---------------
http://boost-spirit.com
http://stellar.cct.lsu.edu


> 
> Sent from my BlackBerry 10 smartphone on the Rogers network.
> From: Thomas Heller
> Sent: Friday, January 8, 2016 11:49 AM
> To: [email protected]
> Reply To: [email protected]
> Subject: Re: [hpx-users] hpx 0.9.11 segmentation fault running on multiple
> localities
> 
> Hi Michael,
> From your segfault, it seems that you're running into one of the problems
> in the 0.9.11 release. We believe that those are fixed on the latest
> master branch. As Hartmut noted, we plan to do a new release soon. Would
> you mind checking with the latest master branch?
> Regards,
> Thomas
> Am 08.01.2016 3:48 nachm. schrieb "Michael Levine"
> <[email protected]>:
> Hi,
> 
> I've been experimenting with hpx for a hobby project, with a small
> virtualized cluster of 3 debian 8.2 machines running on esxi server. When
> I run any hpx code on a single locality, it appears to be working;
> however, whenever I try to use more  than one locality, I invariably get a
> segmentation fault, regardless of which code I am using. I first
> encountered the trouble with my own code, but it also happens when running
> any of the example apps as well. I am somewhat new to all of this and I
> cannot figure out how to attach a debugger to try and identify the cause
> of these errors.
> 
> I'm using hpx 0.9.11 on my small cluster using the latest version of slurm
> . (I chose slurm as it appears to provide support for Intel Phi nodes
> running applications in native mode).  To the best of my knowledge, slurm
> is configured correctly.  However, it is certainly possible that I have
> done something wrong configuring slurm.
> 
> I have tried using boost 1.58, 1.59, and 1.60. I have tried with clang 3.7
> and with Intel C++ 16 and 16 update 1. In all cases, I get the same
> segmentation fault whenever I try and run on more than a single
> locality.  I have played around with single vs. multiple network
> interfaces, single vs. multiple networks, etc.
> 
> Lately, I have re-built boost using the Intel compiler to ensure that
> there was no issue caused by hpx and boost having been compiled with
> different compilers.   I have been trying to troubleshoot this only based
> on the example code, rather than my own code, so that I can be confident
> that the problems are not caused by my own code errors/bugs.
> 
> I know there is a command-line option to attach a debugger but I cannot
> figure out how to use this.
> 
> I’ve attached a copy of my slurm.conf for reference, and the output of --
> hpx:dump-config and --hpx:debug-clp
> 
> HPX stack trace / complete error message is copied below.
> 
> I’m really stuck here and honestly have no idea how to resolve this
> issue.  I greatly appreciate any help that you can offer.  Furthermore,
> I’d really appreciate some guidance as to how to use a debugger to debug
> my own hpx code to identify and resolve issues with that code.  Please let
> me know if there’s any additional information that I should provide.
> 
> Thank you very much in advance,
> Shmuel
> 
> 
> shmuel@ssh01:/usr/local/lib
> > srun -n1 -N1 1d_stencil_7
> Localities,OS_Threads,Execution_Time_sec,Points_per_Partition,Partitions,T
> ime_Steps
> 1,     1,     0.093138849, 10,                   10,                   45
> 
> shmuel@ssh01:~
> > srun -n2 -N1 1d_stencil_7
> 
> {stack-trace}: 4 frames:
> 0x7f09a45e9840  : hpx::detail::backtrace(unsigned long) + 0x80 in
> /usr/local/lib/libhpx.so.0
> 0x7f09a45eeced  : boost::exception_ptr
> hpx::detail::get_exception<hpx::exception>(hpx::exception const&,
> std::string const&, std::string const&, long, std::string const&) + 0x23d
> in /usr/local/lib/libhpx.so.0
> 0x7f09a45ee8bc  : void
> hpx::detail::throw_exception<hpx::exception>(hpx::exception const&,
> std::string const&, std::string const&, long) + 0x10c in
> /usr/local/lib/libhpx.so.0
> 0x7f09a4a0de3d  :
> hpx::agas::server::primary_namespace::resolve_free_list(boost::unique_lock
> <hpx::lcos::local::spinlock>&,
> std::list<std::_Rb_tree_iterator<std::pair<hpx::naming::gid_type const,
> long> >,
> std::allocator<std::_Rb_tree_iterator<std::pair<hpx::naming::gid_type
> const, long> > > > const&,
> std::list<hpx::agas::server::primary_namespace::free_entry,
> std::allocator<hpx::agas::server::primary_namespace::free_entry> >&,
> hpx::naming::gid_type const&, hpx::naming::gid_type const&,
> hpx::error_code&) + 0x137d in /usr/local/lib/libhpx.so.0
> {env}: 85 entries:
>   ALTERNATE_EDITOR=
>   CPATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/mkl/include
>   CXX=clang++
>   DIRHISTORY_SIZE=30
>   DISPLAY=localhost:10.0
>   EDITOR=/usr/bin/vim
>   FPATH=/home/shmuel/.oh-my-zsh/plugins/wd:/home/shmuel/.oh-my-
> zsh/plugins/tmux:/home/shmuel/.oh-my-
> zsh/plugins/dirhistory:/home/shmuel/.oh-my-
> zsh/plugins/colorize:/home/shmuel/.oh-my-
> zsh/plugins/history:/home/shmuel/.oh-my-zsh/plugins/sudo:/home/shmuel/.oh-
> my-zsh/plugins/command-not-found:/home/shmuel/.oh-my-
> zsh/plugins/tmux:/home/shmuel/.oh-my-zsh/plugins/mosh:/home/shmuel/.oh-my-
> zsh/plugins/git-extras:/home/shmuel/.oh-my-
> zsh/plugins/battery:/home/shmuel/.oh-my-zsh/plugins/git-flow-
> avh:/home/shmuel/.oh-my-zsh/plugins/git:/home/shmuel/.oh-my-
> zsh/functions:/home/shmuel/.oh-my-
> zsh/completions:/usr/local/share/zsh/site-functions:/usr/share/zsh/vendor-
> functions:/usr/share/zsh/vendor-
> completions:/usr/share/zsh/functions/Calendar:/usr/share/zsh/functions/Chp
> wd:/usr/share/zsh/functions/Completion:/usr/share/zsh/functions/Completion
> /AIX:/usr/share/zsh/functions/Completion/BSD:/usr/share/zsh/functions/Comp
> letion/Base:/usr/share/zsh/functions/Completion/Cygwin:/usr/share/zsh/func
> tions/Completion/Darwin:/usr/share/zsh/functions/Completion/Debian:/usr/sh
> are/zsh/functions/Completion/Linux:/usr/share/zsh/functions/Completion/Man
> driva:/usr/share/zsh/functions/Completion/Redhat:/usr/share/zsh/functions/
> Completion/Solaris:/usr/share/zsh/functions/Completion/Unix:/usr/share/zsh
> /functions/Completion/X:/usr/share/zsh/functions/Completion/Zsh:/usr/share
> /zsh/functions/Completion/openSUSE:/usr/share/zsh/functions/Exceptions:/us
> r/share/zsh/functions/MIME:/usr/share/zsh/functions/Misc:/usr/share/zsh/fu
> nctions/Newuser:/usr/share/zsh/functions/Prompts:/usr/share/zsh/functions/
> TCP:/usr/share/zsh/functions/VCS_Info:/usr/share/zsh/functions/VCS_Info/Ba
> ckends:/usr/share/zsh/functions/Zftp:/usr/share/zsh/functions/Zle:/home/sh
> muel/bin/funcs
>   GDBSERVER_MIC=/opt/intel/debugger_2016/gdb/targets/mic/bin/gdbserver
>   GDB_CROSS=/opt/intel/debugger_2016/gdb/intel64_mic/bin/gdb-mic
>   HOME=/home/shmuel
>   INFOPATH=/opt/intel/documentation_2016/en/debugger//gdb-
> ia/info/:/opt/intel/documentation_2016/en/debugger//gdb-
> mic/info/:/opt/intel/documentation_2016/en/debugger//gdb-igfx/info/
>   INTEL_LICENSE_FILE=/opt/intel/compilers_and_libraries_2016.1.150/linux/l
> icenses:/opt/intel/licenses:/home/shmuel/intel/licenses
>   INTEL_PYTHONHOME=/opt/intel/debugger_2016/python/intel64/
>   I_MPI_ROOT=/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi
>   LANG=en_US.utf8
>   LANGUAGE=en_CA:en
>   LC_ALL=en_CA.UTF-8
>   LC_CTYPE=en_CA.UTF-8
>   LD_LIBRARY_PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/comp
> iler/lib/intel64:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/i
> ntel64/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/lib
> :/opt/intel/compilers_and_libraries_2016.1.150/linux/tbb/lib/intel64/gcc4.
> 4:/opt/intel/compilers_and_libraries_2016.1.150/linux/compiler/lib/intel64
> :/opt/intel/compilers_and_libraries_2016.1.150/linux/mkl/lib/intel64:/opt/
> intel/debugger_2016/libipt/intel64/lib:/home/shmuel/src/fx/lib/:
>   LESS=-R
>   LIBRARY_PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/tbb/lib
> /intel64/gcc4.4:/opt/intel/compilers_and_libraries_2016.1.150/linux/compil
> er/lib/intel64:/opt/intel/compilers_and_libraries_2016.1.150/linux/mkl/lib
> /intel64
>   LOGNAME=shmuel
>   LSCOLORS=Gxfxcxdxbxegedabagacad
>   MAIL=/var/mail/shmuel
>   MANPATH=/opt/intel/man/common:/opt/intel/compilers_and_libraries_2016.1.
> 150/linux/mpi/man:/opt/intel/compilers_and_libraries_2016.1.150/linux/man/
> en_US:/opt/intel/documentation_2016/en/debugger//gdb-
> ia/man/:/opt/intel/documentation_2016/en/debugger//gdb-
> mic/man/:/opt/intel/documentation_2016/en/debugger//gdb-
> igfx/man/::/home/shmuel/src/tup/
>   MIC_LD_LIBRARY_PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/
> compiler/lib/mic:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/m
> ic/lib:/opt/intel/compilers_and_libraries_2016.1.150/linux/tbb/lib/mic:/op
> t/intel/compilers_and_libraries_2016.1.150/linux/compiler/lib/mic:/opt/int
> el/compilers_and_libraries_2016.1.150/linux/mkl/lib/mic
>   MIC_LIBRARY_PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/com
> piler/lib/mic:/opt/intel/compilers_and_libraries_2016.1.150/linux/mpi/mic/
> lib
>   MKLROOT=/opt/intel/compilers_and_libraries_2016.1.150/linux/mkl
>   MPM_LAUNCHER=/opt/intel/debugger_2016/mpm/mic/bin/start_mpm.sh
>   NLSPATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/compiler/lib
> /intel64/locale/%l_%t/%N:/opt/intel/compilers_and_libraries_2016.1.150/lin
> ux/mkl/lib/intel64/locale/%l_%t/%N:/opt/intel/debugger_2016/gdb/intel64_mi
> c/share/locale/%l_%t/%N:/opt/intel/debugger_2016/gdb/intel64/share/locale/
> %l_%t/%N
>   OLDPWD=/home/shmuel
>   PAGER=less
>   PATH=/opt/intel/compilers_and_libraries_2016.1.150/linux/bin/intel64:/op
> t/intel/compilers_and_libraries_2016.1.150/linux/mpi/intel64/bin:/opt/inte
> l/debugger_2016/gdb/intel64_mic/bin:/usr/local/texlive/2014/bin/x86_64-
> linux:/usr/local/bin:/usr/bin:/bin:/usr/games:/usr/local/sbin:/sbin:/usr/l
> ocal/games:/home/shmuel/src/tup/
>   PWD=/home/shmuel
>   REPORTTIME=2
>   SHELL=/bin/zsh
>   SHLVL=1
>   SLURMD_NODENAME=hpc02
>   SLURM_CHECKPOINT_IMAGE_DIR=/var/lib/slurm-llnl/checkpoint
>   SLURM_CLUSTER_NAME=cluster
>   SLURM_CPUS_ON_NODE=2
>   SLURM_DISTRIBUTION=block
>   SLURM_GTIDS=0,1
>   SLURM_JOBID=2165
>   SLURM_JOB_CPUS_PER_NODE=2
>   SLURM_JOB_ID=2165
>   SLURM_JOB_NAME=1d_stencil_7
>   SLURM_JOB_NODELIST=hpc02
>   SLURM_JOB_NUM_NODES=1
>   SLURM_JOB_PARTITION=debug
>   SLURM_JOB_UID=1000
>   SLURM_JOB_USER=shmuel
>   SLURM_LAUNCH_NODE_IPADDR=192.168.1.125
>   SLURM_LOCALID=1
>   SLURM_NNODES=1
>   SLURM_NODEID=0
>   SLURM_NODELIST=hpc02
>   SLURM_NPROCS=2
>   SLURM_NTASKS=2
>   SLURM_PRIO_PROCESS=0
>   SLURM_PROCID=1
>   SLURM_SRUN_COMM_HOST=192.168.1.125
>   SLURM_SRUN_COMM_PORT=45712
>   SLURM_STEPID=0
>   SLURM_STEP_ID=0
>   SLURM_STEP_LAUNCHER_PORT=45712
>   SLURM_STEP_NODELIST=hpc02
>   SLURM_STEP_NUM_NODES=1
>   SLURM_STEP_NUM_TASKS=2
>   SLURM_STEP_TASKS_PER_NODE=2
>   SLURM_SUBMIT_DIR=/home/shmuel
>   SLURM_SUBMIT_HOST=ssh01.thelevines.ca
>   SLURM_TASKS_PER_NODE=2
>   SLURM_TASK_PID=18855
>   SLURM_TOPOLOGY_ADDR=hpc02
>   SLURM_TOPOLOGY_ADDR_PATTERN=node
>   SRUN_DEBUG=3
>   SSH_CLIENT=193.90.12.86 38280 22
>   SSH_CONNECTION=193.90.12.86 38280 192.168.1.125 22
>   SSH_TTY=/dev/pts/0
>   TERM=xterm
>   USER=shmuel
>   ZSH_TMUX_TERM=screen
>   _=/usr/local/bin/srun
>   _ZSH_TMUX_FIXED_CONFIG=/home/shmuel/.oh-my-
> zsh/plugins/tmux/tmux.only.conf
> {locality-id}: 1
> {hostname}: [ (tcp:192.168.1.72:7911) ]
> {process-id}: 18855
> {function}: primary_namespace::resolve_free_list
> {file}: /usr/src/hpx/src/runtime/agas/server/primary_namespace_server.cpp
> {line}: 1021
> {os-thread}: 0, worker-thread#0
> {thread-id}: 00000000020813c0
> {thread-description}: <unknown>
> {state}: state_running
> {auxinfo}:
> {config}:
>   HPX_HAVE_NATIVE_TLS=ON
>   HPX_HAVE_STACKTRACES=ON
>   HPX_HAVE_COMPRESSION_BZIP2=OFF
>   HPX_HAVE_COMPRESSION_SNAPPY=OFF
>   HPX_HAVE_COMPRESSION_ZLIB=OFF
>   HPX_HAVE_PARCEL_COALESCING=ON
>   HPX_HAVE_PARCELPORT_TCP=ON
>   HPX_HAVE_PARCELPORT_MPI=OFF
>   HPX_HAVE_PARCELPORT_IPC=OFF
>   HPX_HAVE_PARCELPORT_IBVERBS=OFF
>   HPX_HAVE_VERIFY_LOCKS=OFF
>   HPX_HAVE_HWLOC=ON
>   HPX_HAVE_ITTNOTIFY=OFF
>   HPX_HAVE_RUN_MAIN_EVERYWHERE=OFF
>   HPX_LIMIT=5
>   HPX_PARCEL_MAX_CONNECTIONS=512
>   HPX_PARCEL_MAX_CONNECTIONS_PER_LOCALITY=4
>   HPX_INITIAL_AGAS_LOCAL_CACHE_SIZE=256
>   HPX_AGAS_LOCAL_CACHE_SIZE_PER_THREAD=32
>   HPX_HAVE_MALLOC=tcmalloc
>   HPX_PREFIX (configured)=/usr/local
>   HPX_PREFIX=/usr/local
> {version}: V0.9.11 (AGAS: V3.0), Git: 4c96a9b3b3
> {boost}: V1.60.0
> {build-type}: release
> {date}: Jan  3 2016 23:53:54
> {platform}: linux
> {compiler}: Intel C++ C++0x mode version 1600
> {stdlib}: GNU libstdc++ version 20141220
> {what}: primary_namespace::resolve_free_list, failed to resolve gid,
> gid({0000000200000001, 0000000000001002}): HPX(internal_server_error)
> 
> {stack-trace}: 2 frames:
> 0x7f09a4670d79  : hpx::termination_handler(int) + 0x159 in
> /usr/local/lib/libhpx.so.0
> 0x7f09a11e78d0  : ??? + 0x7f09a11e78d0 in /lib/x86_64-linux-
> gnu/libpthread.so.0
> {what}: Segmentation fault
> {config}:
>   HPX_HAVE_NATIVE_TLS=ON
>   HPX_HAVE_STACKTRACES=ON
>   HPX_HAVE_COMPRESSION_BZIP2=OFF
>   HPX_HAVE_COMPRESSION_SNAPPY=OFF
>   HPX_HAVE_COMPRESSION_ZLIB=OFF
>   HPX_HAVE_PARCEL_COALESCING=ON
>   HPX_HAVE_PARCELPORT_TCP=ON
>   HPX_HAVE_PARCELPORT_MPI=OFF
>   HPX_HAVE_PARCELPORT_IPC=OFF
>   HPX_HAVE_PARCELPORT_IBVERBS=OFF
>   HPX_HAVE_VERIFY_LOCKS=OFF
>   HPX_HAVE_HWLOC=ON
>   HPX_HAVE_ITTNOTIFY=OFF
>   HPX_HAVE_RUN_MAIN_EVERYWHERE=OFF
>   HPX_LIMIT=5
>   HPX_PARCEL_MAX_CONNECTIONS=512
>   HPX_PARCEL_MAX_CONNECTIONS_PER_LOCALITY=4
>   HPX_INITIAL_AGAS_LOCAL_CACHE_SIZE=256
>   HPX_AGAS_LOCAL_CACHE_SIZE_PER_THREAD=32
>   HPX_HAVE_MALLOC=tcmalloc
>   HPX_PREFIX (configured)=/usr/local
>   HPX_PREFIX=/usr/local
> {version}: V0.9.11 (AGAS: V3.0), Git: 4c96a9b3b3
> {boost}: V1.60.0
> {build-type}: release
> {date}: Jan  3 2016 23:53:54
> {platform}: linux
> {compiler}: Intel C++ C++0x mode version 1600
> {stdlib}: GNU libstdc++ version 20141220
> srun: error: hpc02: task 1: Aborted
> 
> 
> _______________________________________________
> hpx-users mailing list
> [email protected]
> https://mail.cct.lsu.edu/mailman/listinfo/hpx-users
> 


_______________________________________________
hpx-users mailing list
[email protected]
https://mail.cct.lsu.edu/mailman/listinfo/hpx-users

Reply via email to