Thank you for your feedback! Yes, we have already identified a connection to a specific parameter combination of our program. I assume it's probably really a life time issue in this code path; not in the serialization code itself but rather that the object to be sent goes out of scope before serialization (or something like that) Currently, I don't have the time to debug it, but I will update this thread once we have found the cause. Thank you!
> On 9 Dec 2016, at 06:16, Thomas Heller <thom.hel...@gmail.com> wrote: > > > > Am 09.12.2016 2:36 vorm. schrieb "Hartmut Kaiser" <hartmut.kai...@gmail.com>: > Tim, > > > when running my HPX application on our cluster with multiple localities > > I SOMETIMES get a segmentation fault with error message: "archive data > > bstream data chunk size mismatch: HPX(serialization_error)". > > > > And when I rerun the same configuration, it either works or sometimes > > segfaults again. > > > > Any idea what could cause this or how to debug it? > > I have not seen this problem before. Could you provide us with the code for > your application? > > Thomas, is that a known issue with the MPI parcelport? > > > No it's not. I haven't seen this problem in a while. It's triggered in the > deserialization of the parcel. So it could be a corrupted parcel. Could you > inspect your serialization code for any lifetime issues? Do you maybe > serialize a temporary buffer with make_array? > > > Regards Hartmut > --------------- > http://boost-spirit.com > http://stellar.cct.lsu.edu > > > > > > Thanks! > > > > Tim > > > > The full error output follows: > > > > > > > > {stack-trace}: 4 frames: > > 0x2b40ce84564c : hpx::detail::backtrace[abi:cxx11](unsigned long) + > > 0x9c in /home/tbiedert/local/lib/libhpx.so.1 > > 0x2b40ce8918fa : boost::exception_ptr > > hpx::detail::get_exception<hpx::exception>(hpx::exception const&, > > std::__cxx11::basic_string<char, std::char_traits<char>, > > std::allocator<char> > const&, std::__cxx11::basic_string<char, > > std::char_traits<char>, std::allocator<char> > const&, long, > > std::__cxx11::basic_string<char, std::char_traits<char>, > > std::allocator<char> > const&) + 0xaa in > > /home/tbiedert/local/lib/libhpx.so.1 > > 0x2b40ce891e5e : void > > hpx::detail::throw_exception<hpx::exception>(hpx::exception const&, > > std::__cxx11::basic_string<char, std::char_traits<char>, > > std::allocator<char> > const&, std::__cxx11::basic_string<char, > > std::char_traits<char>, std::allocator<char> > const&, long) + 0x4e in > > /home/tbiedert/local/lib/libhpx.so.1 > > 0x2b40ce92049e : hpx::detail::throw_exception(hpx::error, > > std::__cxx11::basic_string<char, std::char_traits<char>, > > std::allocator<char> > const&, std::__cxx11::basic_string<char, > > std::char_traits<char>, std::allocator<char> > const&, > > std::__cxx11::basic_string<char, std::char_traits<char>, > > std::allocator<char> > const&, long) + 0x4e in > > /home/tbiedert/local/lib/libhpx.so.1 > > {env}: 177 entries: > > BASH_FUNC_module()=() { eval `/usr/bin/modulecmd bash $*` > > } > > BINARY_TYPE_HPC= > > BSUB_BLOCK_EXEC_HOST= > > CFLAGS=-I/software/binutils/2.27/include -I/software/gcc/6.2.0/include > > CMAKE_PREFIX_PATH=/home/tbiedert/local > > CPATH=/home/tbiedert/local/opt/tbb2017-update3/include > > CPLUS_INCLUDE_PATH=/software/binutils/2.27/include:/software/gcc/6.2.0/inc > > lude:/home/tbiedert/local/include: > > CPPFLAGS=-I/software/binutils/2.27/include - > > I/software/gcc/6.2.0/include > > CPP_INCLUDE_PATH=/home/tbiedert/local/include: > > CVS_RSH=ssh > > C_INCLUDE_PATH=/software/binutils/2.27/include:/software/gcc/6.2.0/include > > :/home/tbiedert/local/include: > > G_BROKEN_FILENAMES=1 > > HISTCONTROL=ignoreboth > > HISTSIZE=500 > > HOME=/home/tbiedert > > HOSTNAME=node774 > > HOSTTYPE=X86_64 > > ITERM_ORIG_PS1=\[\033[7m\]\u@\h\[\033[m\] [\W] > > ITERM_PREV_PS1=\[\]\[\033[7m\]\u@\h\[\033[m\] [\W] \[\] > > JOB_TERMINATE_INTERVAL=300 > > KDEDIRS=/usr > > KDE_IS_PRELINKED=1 > > LANG=en_US.UTF-8 > > LDFLAGS=-L/software/binutils/2.27/lib -L/software/gcc/6.2.0/lib64 > > -L/software/gcc/6.2.0/lib > > LD_LIBRARY_PATH=/lsf/9.1/linux2.6-glibc2.3- > > x86_64/lib:/home/tbiedert/local/opt/tbb2017- > > update3/build/linux_intel64_gcc_cc6.2.0_libc2.12_kernel2.6.32_release:/sof > > tware/binutils/2.27/lib:/software/gcc/6.2.0/lib64:/software/gcc/6.2.0/lib: > > /home/tbiedert/local/lib:/home/tbiedert/local/usr/lib64:/home/tbiedert/loc > > al/lib64 > > LESSOPEN=||/usr/bin/lesspipe.sh %s > > LIBRARY_PATH=/home/tbiedert/local/opt/tbb2017- > > update3/build/linux_intel64_gcc_cc6.2.0_libc2.12_kernel2.6.32_release > > LOADEDMODULES=gcc/6.2.0:binutils/latest > > LOGNAME=tbiedert > > LSB_ACCT_FILE=/tmp/5324709.tmpdir/.1481211361.5324709.acct > > LSB_AFFINITY_HOSTFILE=/home/tbiedert/.lsbatch/1481211361.5324709.hostAffin > > ityFile > > LSB_APPLICATION_NAME=hybrid_mpi_openmp > > LSB_BATCH_JID=5324709 > > LSB_BIND_CPU_LIST=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 > > LSB_CHKFILENAME=/home/tbiedert/.lsbatch/1481211361.5324709 > > LSB_DJOB_HOSTFILE=/home/tbiedert/.lsbatch/1481211361.5324709.hostfile > > LSB_DJOB_NUMPROC=128 > > LSB_DJOB_RANKFILE=/home/tbiedert/.lsbatch/1481211361.5324709.hostfile > > LSB_ECHKPNT_RSH_CMD=ssh > > LSB_EEXEC_REAL_GID= > > LSB_EEXEC_REAL_UID= > > LSB_EFFECTIVE_RSRCREQ=select[ ((( (model == XEON_E5_2640v3)) && type > > == any))] order[-slots:-maxslots] rusage[mem=60000.00] span[ptile=16] > > same[model] cu[type=switch:maxcus=1:pref=config] > > affinity[core(1)*1:distribute=pack] > > LSB_ERRORFILE=5324709.err > > LSB_EXEC_CLUSTER=Elwetritsch > > LSB_EXEC_HOSTTYPE=X86_64 > > LSB_EXIT_PRE_ABORT=99 > > LSB_HOSTS=node790 node790 node790 node790 node790 node790 node790 > > node790 node790 node790 node790 node790 node790 node790 node790 node790 > > node792 node792 node792 node792 node792 node792 node792 node792 node792 > > node792 node792 node792 node792 node792 node792 node792 node793 node793 > > node793 node793 node793 node793 node793 node793 node793 node793 node793 > > node793 node793 node793 node793 node793 node795 node795 node795 node795 > > node795 node795 node795 node795 node795 node795 node795 node795 node795 > > node795 node795 node795 node796 node796 node796 node796 node796 node796 > > node796 node796 node796 node796 node796 node796 node796 node796 node796 > > node796 node773 node773 node773 node773 node773 node773 node773 node773 > > node773 node773 node773 node773 node773 node773 node773 node773 node774 > > node774 node774 node774 node774 node774 node774 node774 node774 node774 > > node774 node774 node774 node774 node774 node774 node775 node775 node775 > > node775 node775 node775 node775 node775 node775 node775 node775 node775 > > node775 node775 node775 node775 > > LSB_JOBEXIT_STAT=0 > > LSB_JOBFILENAME=/home/tbiedert/.lsbatch/1481211361.5324709 > > LSB_JOBID=5324709 > > LSB_JOBINDEX=0 > > LSB_JOBNAME=mpirun --map-by ppr:1:node --bind-to none ./hpxvr > > --hpx:threads 16 --no-output --csv --warmup 3 --benchmark 8 --blockSize > > 256x256x256 --tileSize 64x34 --preload --distributed --compress > > /scratch/tbiedert/4096x4096x4096.dummy > > LSB_JOBRES_CALLBACK=56355@node790 > > LSB_JOBRES_PID=485 > > LSB_JOB_EXECUSER=tbiedert > > LSB_JOB_STARTER=/lsf/rhrk/bin/job_starter_hybrid_mpi_openmp "%USRCMD" > > LSB_MAX_NUM_PROCESSORS=128 > > LSB_MCPU_HOSTS=node790 1 node792 1 node793 1 node795 1 node796 1 > > node773 1 node774 1 node775 1 > > LSB_OUTDIR=/home/tbiedert/HPX-VolumeRendering/build > > LSB_OUTPUTFILE=5324709.out > > LSB_PROJECT_NAME=default > > LSB_QUEUE=short > > LSB_RES_GET_FANOUT_INFO=Y > > LSB_SUB_HOST=head4 > > LSB_SUB_RES_REQ=select[(model==XEON_E5_2640v3)] rusage[mem=60000] > > span[ptile=16] cu[maxcus=1:type=switch] > > LSB_SUB_USER=tbiedert > > LSB_TRAPSIGS=trap # 15 10 12 2 1 > > LSB_UNIXGROUP_INT=inf > > LSB_XFER_OP= > > LSFUSER=tbiedert > > LSF_BINDIR=/lsf/9.1/linux2.6-glibc2.3-x86_64/bin > > LSF_CGROUP_TOPDIR_KEY=Elwetritsch > > LSF_EAUTH_AUX_DATA=/tmp/.auxr9ymHwN > > LSF_EAUTH_AUX_PASS=yes > > LSF_EAUTH_CLIENT=user > > LSF_EAUTH_SERVER=mbatchd@Elwetritsch > > LSF_ENVDIR=/lsf/conf > > LSF_FROM_HOST=node790 > > LSF_INVOKE_CMD=bsub > > LSF_JOB_TIMESTAMP_VALUE=1481212155 > > LSF_LIBDIR=/lsf/9.1/linux2.6-glibc2.3-x86_64/lib > > LSF_LIM_API_NTRIES=1 > > LSF_LOGDIR=/lsf/log > > LSF_PJL_TYPE=openmpi > > LSF_PM_TASKID=6 > > LSF_SERVERDIR=/lsf/9.1/linux2.6-glibc2.3-x86_64/etc > > LSF_VERSION=30 > > LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33 > > ;01:cd=40;33;01:or=40;31;01:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=3 > > 0;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz= > > 01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01; > > 31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.tbz > > =01;31:*.tbz2=01;31:*.bz=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01 > > ;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31 > > :*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35: > > *.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:* > > .png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*. > > mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.ogm=01;35:*.mp4=01;35:*.m > > 4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf > > =01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=0 > > 1;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35 > > :*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=01;36:* > > .au=01;36:*.flac=01;36:*.mid=01;36:*.midi=01;36:*.mka=01;36:*.mp3=01;36:*. > > mpc=01;36:*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.axa=01;36:*.oga=01;36:*.spx > > =01;36:*.xspf=01;36: > > LS_EXECCWD=/home/tbiedert/HPX-VolumeRendering/build > > LS_EXEC_T=START > > LS_JOBPID=79755 > > LS_SUBCWD=/home/tbiedert/HPX-VolumeRendering/build > > MAIL=/var/spool/mail/tbiedert > > MANPATH=/software/gcc/6.2.0/share/man:/home/tbiedert/local/share/man:/lsf/ > > 9.1/man: > > MODULEPATH=/software/modulefiles > > MODULESHOME=/usr/share/Modules > > MSM_HOME=/usr/local/MegaRAID Storage Manager > > MSM_PRODUCT=MSM > > NXDIR=/usr/NX > > OMPI_APP_CTX_NUM_PROCS=8 > > OMPI_ARGV=--hpx:threads 16 --no-output --csv --warmup 3 --benchmark 8 > > --blockSize 256x256x256 --tileSize 64x34 --preload --distributed > > --compress /scratch/tbiedert/4096x4096x4096.dummy > > OMPI_COMMAND=hpxvr > > OMPI_COMM_WORLD_LOCAL_RANK=0 > > OMPI_COMM_WORLD_LOCAL_SIZE=1 > > OMPI_COMM_WORLD_NODE_RANK=0 > > OMPI_COMM_WORLD_RANK=6 > > OMPI_COMM_WORLD_SIZE=8 > > OMPI_FILE_LOCATION=/tmp/5324709.tmpdir/openmpi-sessions- > > tbiedert@node774_0/4164/1/6 > > OMPI_FIRST_RANKS=0 > > OMPI_MCA_db=^pmi > > OMPI_MCA_ess=env > > OMPI_MCA_ess_base_jobid=272891905 > > OMPI_MCA_ess_base_vpid=6 > > OMPI_MCA_grpcomm=^pmi > > OMPI_MCA_hwloc_base_binding_policy=none > > OMPI_MCA_initial_wdir=/home/tbiedert/HPX-VolumeRendering/build > > OMPI_MCA_mpi_yield_when_idle=0 > > OMPI_MCA_orte_app_num=0 > > OMPI_MCA_orte_bound_at_launch=1 > > OMPI_MCA_orte_ess_jobid=272891904 > > OMPI_MCA_orte_ess_node_rank=0 > > OMPI_MCA_orte_ess_num_procs=8 > > OMPI_MCA_orte_ess_vpid=1 > > OMPI_MCA_orte_hnp_uri=272891904.0;tcp://10.255.8.90,10.250.8.90:48359 > > OMPI_MCA_orte_local_daemon_uri=272891904.6;tcp://10.255.8.74,10.250.8.74:4 > > 7752 > > OMPI_MCA_orte_num_nodes=8 > > OMPI_MCA_orte_num_restarts=0 > > OMPI_MCA_orte_peer_fini_barrier_id=2 > > OMPI_MCA_orte_peer_init_barrier_id=1 > > OMPI_MCA_orte_peer_modex_id=0 > > OMPI_MCA_orte_precondition_transports=e2dcd4f3b6aa563f-9fb1cf15b9c08abf > > OMPI_MCA_orte_tmpdir_base=/tmp/5324709.tmpdir > > OMPI_MCA_pubsub=^pmi > > OMPI_MCA_rmaps_base_mapping_policy=ppr:1:node > > OMPI_MCA_shmem_RUNTIME_QUERY_hint=mmap > > OMPI_NUM_APP_CTX=1 > > OMPI_UNIVERSE_SIZE=128 > > OPAL_OUTPUT_STDERR_FD=18 > > PATH=/lsf/9.1/linux2.6-glibc2.3- > > x86_64/bin:/software/binutils/2.27/bin:/software/gcc/6.2.0/bin:/home/tbied > > ert/local/bin:/lsf/rhrk/bin:/cluster/rhrk/bin:/usr/lib64/qt- > > 3.3/bin:/usr/NX/bin:/lsf/9.1/linux2.6-glibc2.3- > > x86_64/etc:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/o > > pt/bin:/home/tbiedert/bin > > PWD=/home/tbiedert/HPX-VolumeRendering/build > > QTDIR=/usr/lib64/qt-3.3 > > QTINC=/usr/lib64/qt-3.3/include > > QTLIB=/usr/lib64/qt-3.3/lib > > RBH_CFG_DEFAULT=/cluster/robinhood/conf/scratch.conf > > RHRK_MPI_HYBRID=1 > > RHRK_NOTIFICATION=LOGS > > RM_CPUTASK10=3 > > RM_CPUTASK11=5 > > RM_CPUTASK12=7 > > RM_CPUTASK13=9 > > RM_CPUTASK14=11 > > RM_CPUTASK15=13 > > RM_CPUTASK16=15 > > RM_CPUTASK1=0 > > RM_CPUTASK2=2 > > RM_CPUTASK3=4 > > RM_CPUTASK4=6 > > RM_CPUTASK5=8 > > RM_CPUTASK6=10 > > RM_CPUTASK7=12 > > RM_CPUTASK8=14 > > RM_CPUTASK9=1 > > SBD_KRB5CCNAME_VAL= > > SCRATCH=/scratch/tbiedert > > SHELL=/bin/bash > > SHLVL=4 > > SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass > > SSH_CLIENT=131.246.17.22 35482 22 > > SSH_CONNECTION=131.246.17.22 35482 131.246.113.228 22 > > SSH_TTY=/dev/pts/5 > > TBBROOT=/home/tbiedert/local/opt/tbb2017-update3 > > TMOUT=3600 > > TMPDIR=/tmp/5324709.tmpdir > > USER=tbiedert > > _=/home/tbiedert/local/bin/mpirun > > _LMFILES_=/software/modulefiles/gcc/6.2.0:/software/modulefiles/binutils/l > > atest > > __LSF_JOB_TMPDIR__=/tmp/5324709.tmpdir > > {locality-id}: 6 > > {hostname}: [ (mpi:6) ] > > {process-id}: 79756 > > {function}: input_container::load_binary_chunk > > {file}: /tmp/hpx-build/hpx/hpx/runtime/serialization/input_container.hpp > > {line}: 146 > > {os-thread}: worker-thread#11 > > {thread-description}: <unknown> > > {state}: state_running > > {auxinfo}: > > {config}: > > HPX_HAVE_NATIVE_TLS=ON > > HPX_HAVE_STACKTRACES=ON > > HPX_HAVE_COMPRESSION_BZIP2=OFF > > HPX_HAVE_COMPRESSION_SNAPPY=OFF > > HPX_HAVE_COMPRESSION_ZLIB=OFF > > HPX_HAVE_PARCEL_COALESCING=ON > > HPX_HAVE_PARCELPORT_TCP=OFF > > HPX_HAVE_PARCELPORT_MPI=ON (OpenMPI V1.8.3, MPI V3.0) > > HPX_HAVE_VERIFY_LOCKS=OFF > > HPX_HAVE_HWLOC=ON > > HPX_HAVE_ITTNOTIFY=OFF > > HPX_HAVE_RUN_MAIN_EVERYWHERE=OFF > > HPX_PARCEL_MAX_CONNECTIONS=512 > > HPX_PARCEL_MAX_CONNECTIONS_PER_LOCALITY=4 > > HPX_AGAS_LOCAL_CACHE_SIZE=4096 > > HPX_HAVE_MALLOC=tcmalloc > > HPX_PREFIX (configured)=/home/tbiedert/local > > HPX_PREFIX=/home/tbiedert/local > > {version}: V1.0.0-trunk (AGAS: V3.0), Git: 9ecdb73e07 > > {boost}: V1.62.0 > > {build-type}: release > > {date}: Dec 7 2016 20:41:41 > > {platform}: linux > > {compiler}: GNU C++ version 6.2.0 > > {stdlib}: GNU libstdc++ version 20160822 > > {what}: archive data bstream data chunk size mismatch: > > HPX(serialization_error) > > > > > > _______________________________________________ > hpx-users mailing list > hpx-users@stellar.cct.lsu.edu > https://mail.cct.lsu.edu/mailman/listinfo/hpx-users
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ hpx-users mailing list hpx-users@stellar.cct.lsu.edu https://mail.cct.lsu.edu/mailman/listinfo/hpx-users