Thank you for your feedback!

Yes, we have already identified a connection to a specific parameter 
combination of our program.  I assume it's probably really a life time issue in 
this code path; not in the serialization code itself but rather that the object 
to be sent goes out of scope before serialization (or something like that)  
Currently, I don't have the time to debug it, but I will update this thread 
once we have found the cause. 
Thank you! 

> On 9 Dec 2016, at 06:16, Thomas Heller <thom.hel...@gmail.com> wrote:
> 
> 
> 
> Am 09.12.2016 2:36 vorm. schrieb "Hartmut Kaiser" <hartmut.kai...@gmail.com>:
> Tim,
> 
> > when running my HPX application on our cluster with multiple localities
> > I SOMETIMES get a segmentation fault with error message: "archive data
> > bstream data chunk size mismatch: HPX(serialization_error)".
> >
> > And when I rerun the same configuration, it either works or sometimes
> > segfaults again.
> >
> > Any idea what could cause this or how to debug it?
> 
> I have not seen this problem before. Could you provide us with the code for 
> your application?
> 
> Thomas, is that a known issue with the MPI parcelport?
> 
> 
> No it's not. I haven't seen this problem in a while. It's triggered in the 
> deserialization of the parcel. So it could be a corrupted parcel. Could you 
> inspect your serialization code for any lifetime issues? Do you maybe 
> serialize a temporary buffer with make_array?
> 
> 
> Regards Hartmut
> ---------------
> http://boost-spirit.com
> http://stellar.cct.lsu.edu
> 
> 
> >
> > Thanks!
> >
> > Tim
> >
> > The full error output follows:
> >
> >
> >
> > {stack-trace}: 4 frames:
> > 0x2b40ce84564c  : hpx::detail::backtrace[abi:cxx11](unsigned long) +
> > 0x9c in /home/tbiedert/local/lib/libhpx.so.1
> > 0x2b40ce8918fa  : boost::exception_ptr
> > hpx::detail::get_exception<hpx::exception>(hpx::exception const&,
> > std::__cxx11::basic_string<char, std::char_traits<char>,
> > std::allocator<char> > const&, std::__cxx11::basic_string<char,
> > std::char_traits<char>, std::allocator<char> > const&, long,
> > std::__cxx11::basic_string<char, std::char_traits<char>,
> > std::allocator<char> > const&) + 0xaa in
> > /home/tbiedert/local/lib/libhpx.so.1
> > 0x2b40ce891e5e  : void
> > hpx::detail::throw_exception<hpx::exception>(hpx::exception const&,
> > std::__cxx11::basic_string<char, std::char_traits<char>,
> > std::allocator<char> > const&, std::__cxx11::basic_string<char,
> > std::char_traits<char>, std::allocator<char> > const&, long) + 0x4e in
> > /home/tbiedert/local/lib/libhpx.so.1
> > 0x2b40ce92049e  : hpx::detail::throw_exception(hpx::error,
> > std::__cxx11::basic_string<char, std::char_traits<char>,
> > std::allocator<char> > const&, std::__cxx11::basic_string<char,
> > std::char_traits<char>, std::allocator<char> > const&,
> > std::__cxx11::basic_string<char, std::char_traits<char>,
> > std::allocator<char> > const&, long) + 0x4e in
> > /home/tbiedert/local/lib/libhpx.so.1
> > {env}: 177 entries:
> >    BASH_FUNC_module()=() {  eval `/usr/bin/modulecmd bash $*`
> > }
> >    BINARY_TYPE_HPC=
> >    BSUB_BLOCK_EXEC_HOST=
> >    CFLAGS=-I/software/binutils/2.27/include -I/software/gcc/6.2.0/include
> >    CMAKE_PREFIX_PATH=/home/tbiedert/local
> >    CPATH=/home/tbiedert/local/opt/tbb2017-update3/include
> > CPLUS_INCLUDE_PATH=/software/binutils/2.27/include:/software/gcc/6.2.0/inc
> > lude:/home/tbiedert/local/include:
> >    CPPFLAGS=-I/software/binutils/2.27/include -
> > I/software/gcc/6.2.0/include
> >    CPP_INCLUDE_PATH=/home/tbiedert/local/include:
> >    CVS_RSH=ssh
> > C_INCLUDE_PATH=/software/binutils/2.27/include:/software/gcc/6.2.0/include
> > :/home/tbiedert/local/include:
> >    G_BROKEN_FILENAMES=1
> >    HISTCONTROL=ignoreboth
> >    HISTSIZE=500
> >    HOME=/home/tbiedert
> >    HOSTNAME=node774
> >    HOSTTYPE=X86_64
> >    ITERM_ORIG_PS1=\[\033[7m\]\u@\h\[\033[m\] [\W]
> >    ITERM_PREV_PS1=\[\]\[\033[7m\]\u@\h\[\033[m\] [\W] \[\]
> >    JOB_TERMINATE_INTERVAL=300
> >    KDEDIRS=/usr
> >    KDE_IS_PRELINKED=1
> >    LANG=en_US.UTF-8
> >    LDFLAGS=-L/software/binutils/2.27/lib -L/software/gcc/6.2.0/lib64
> > -L/software/gcc/6.2.0/lib
> > LD_LIBRARY_PATH=/lsf/9.1/linux2.6-glibc2.3-
> > x86_64/lib:/home/tbiedert/local/opt/tbb2017-
> > update3/build/linux_intel64_gcc_cc6.2.0_libc2.12_kernel2.6.32_release:/sof
> > tware/binutils/2.27/lib:/software/gcc/6.2.0/lib64:/software/gcc/6.2.0/lib:
> > /home/tbiedert/local/lib:/home/tbiedert/local/usr/lib64:/home/tbiedert/loc
> > al/lib64
> >    LESSOPEN=||/usr/bin/lesspipe.sh %s
> > LIBRARY_PATH=/home/tbiedert/local/opt/tbb2017-
> > update3/build/linux_intel64_gcc_cc6.2.0_libc2.12_kernel2.6.32_release
> >    LOADEDMODULES=gcc/6.2.0:binutils/latest
> >    LOGNAME=tbiedert
> >    LSB_ACCT_FILE=/tmp/5324709.tmpdir/.1481211361.5324709.acct
> > LSB_AFFINITY_HOSTFILE=/home/tbiedert/.lsbatch/1481211361.5324709.hostAffin
> > ityFile
> >    LSB_APPLICATION_NAME=hybrid_mpi_openmp
> >    LSB_BATCH_JID=5324709
> >    LSB_BIND_CPU_LIST=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
> >    LSB_CHKFILENAME=/home/tbiedert/.lsbatch/1481211361.5324709
> > LSB_DJOB_HOSTFILE=/home/tbiedert/.lsbatch/1481211361.5324709.hostfile
> >    LSB_DJOB_NUMPROC=128
> > LSB_DJOB_RANKFILE=/home/tbiedert/.lsbatch/1481211361.5324709.hostfile
> >    LSB_ECHKPNT_RSH_CMD=ssh
> >    LSB_EEXEC_REAL_GID=
> >    LSB_EEXEC_REAL_UID=
> >    LSB_EFFECTIVE_RSRCREQ=select[ ((( (model == XEON_E5_2640v3)) && type
> > == any))] order[-slots:-maxslots] rusage[mem=60000.00] span[ptile=16]
> > same[model] cu[type=switch:maxcus=1:pref=config]
> > affinity[core(1)*1:distribute=pack]
> >    LSB_ERRORFILE=5324709.err
> >    LSB_EXEC_CLUSTER=Elwetritsch
> >    LSB_EXEC_HOSTTYPE=X86_64
> >    LSB_EXIT_PRE_ABORT=99
> >    LSB_HOSTS=node790 node790 node790 node790 node790 node790 node790
> > node790 node790 node790 node790 node790 node790 node790 node790 node790
> > node792 node792 node792 node792 node792 node792 node792 node792 node792
> > node792 node792 node792 node792 node792 node792 node792 node793 node793
> > node793 node793 node793 node793 node793 node793 node793 node793 node793
> > node793 node793 node793 node793 node793 node795 node795 node795 node795
> > node795 node795 node795 node795 node795 node795 node795 node795 node795
> > node795 node795 node795 node796 node796 node796 node796 node796 node796
> > node796 node796 node796 node796 node796 node796 node796 node796 node796
> > node796 node773 node773 node773 node773 node773 node773 node773 node773
> > node773 node773 node773 node773 node773 node773 node773 node773 node774
> > node774 node774 node774 node774 node774 node774 node774 node774 node774
> > node774 node774 node774 node774 node774 node774 node775 node775 node775
> > node775 node775 node775 node775 node775 node775 node775 node775 node775
> > node775 node775 node775 node775
> >    LSB_JOBEXIT_STAT=0
> >    LSB_JOBFILENAME=/home/tbiedert/.lsbatch/1481211361.5324709
> >    LSB_JOBID=5324709
> >    LSB_JOBINDEX=0
> >    LSB_JOBNAME=mpirun --map-by ppr:1:node --bind-to none ./hpxvr
> > --hpx:threads 16 --no-output --csv --warmup 3 --benchmark 8 --blockSize
> > 256x256x256 --tileSize 64x34 --preload --distributed --compress
> > /scratch/tbiedert/4096x4096x4096.dummy
> >    LSB_JOBRES_CALLBACK=56355@node790
> >    LSB_JOBRES_PID=485
> >    LSB_JOB_EXECUSER=tbiedert
> >    LSB_JOB_STARTER=/lsf/rhrk/bin/job_starter_hybrid_mpi_openmp "%USRCMD"
> >    LSB_MAX_NUM_PROCESSORS=128
> >    LSB_MCPU_HOSTS=node790 1 node792 1 node793 1 node795 1 node796 1
> > node773 1 node774 1 node775 1
> >    LSB_OUTDIR=/home/tbiedert/HPX-VolumeRendering/build
> >    LSB_OUTPUTFILE=5324709.out
> >    LSB_PROJECT_NAME=default
> >    LSB_QUEUE=short
> >    LSB_RES_GET_FANOUT_INFO=Y
> >    LSB_SUB_HOST=head4
> >    LSB_SUB_RES_REQ=select[(model==XEON_E5_2640v3)] rusage[mem=60000]
> > span[ptile=16] cu[maxcus=1:type=switch]
> >    LSB_SUB_USER=tbiedert
> >    LSB_TRAPSIGS=trap # 15 10 12 2 1
> >    LSB_UNIXGROUP_INT=inf
> >    LSB_XFER_OP=
> >    LSFUSER=tbiedert
> >    LSF_BINDIR=/lsf/9.1/linux2.6-glibc2.3-x86_64/bin
> >    LSF_CGROUP_TOPDIR_KEY=Elwetritsch
> >    LSF_EAUTH_AUX_DATA=/tmp/.auxr9ymHwN
> >    LSF_EAUTH_AUX_PASS=yes
> >    LSF_EAUTH_CLIENT=user
> >    LSF_EAUTH_SERVER=mbatchd@Elwetritsch
> >    LSF_ENVDIR=/lsf/conf
> >    LSF_FROM_HOST=node790
> >    LSF_INVOKE_CMD=bsub
> >    LSF_JOB_TIMESTAMP_VALUE=1481212155
> >    LSF_LIBDIR=/lsf/9.1/linux2.6-glibc2.3-x86_64/lib
> >    LSF_LIM_API_NTRIES=1
> >    LSF_LOGDIR=/lsf/log
> >    LSF_PJL_TYPE=openmpi
> >    LSF_PM_TASKID=6
> >    LSF_SERVERDIR=/lsf/9.1/linux2.6-glibc2.3-x86_64/etc
> >    LSF_VERSION=30
> > LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33
> > ;01:cd=40;33;01:or=40;31;01:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=3
> > 0;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=
> > 01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;
> > 31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.tbz
> > =01;31:*.tbz2=01;31:*.bz=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01
> > ;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31
> > :*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:
> > *.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*
> > .png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.
> > mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.ogm=01;35:*.mp4=01;35:*.m
> > 4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf
> > =01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=0
> > 1;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35
> > :*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=01;36:*
> > .au=01;36:*.flac=01;36:*.mid=01;36:*.midi=01;36:*.mka=01;36:*.mp3=01;36:*.
> > mpc=01;36:*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.axa=01;36:*.oga=01;36:*.spx
> > =01;36:*.xspf=01;36:
> >    LS_EXECCWD=/home/tbiedert/HPX-VolumeRendering/build
> >    LS_EXEC_T=START
> >    LS_JOBPID=79755
> >    LS_SUBCWD=/home/tbiedert/HPX-VolumeRendering/build
> >    MAIL=/var/spool/mail/tbiedert
> > MANPATH=/software/gcc/6.2.0/share/man:/home/tbiedert/local/share/man:/lsf/
> > 9.1/man:
> >    MODULEPATH=/software/modulefiles
> >    MODULESHOME=/usr/share/Modules
> >    MSM_HOME=/usr/local/MegaRAID Storage Manager
> >    MSM_PRODUCT=MSM
> >    NXDIR=/usr/NX
> >    OMPI_APP_CTX_NUM_PROCS=8
> >    OMPI_ARGV=--hpx:threads 16 --no-output --csv --warmup 3 --benchmark 8
> > --blockSize 256x256x256 --tileSize 64x34 --preload --distributed
> > --compress /scratch/tbiedert/4096x4096x4096.dummy
> >    OMPI_COMMAND=hpxvr
> >    OMPI_COMM_WORLD_LOCAL_RANK=0
> >    OMPI_COMM_WORLD_LOCAL_SIZE=1
> >    OMPI_COMM_WORLD_NODE_RANK=0
> >    OMPI_COMM_WORLD_RANK=6
> >    OMPI_COMM_WORLD_SIZE=8
> > OMPI_FILE_LOCATION=/tmp/5324709.tmpdir/openmpi-sessions-
> > tbiedert@node774_0/4164/1/6
> >    OMPI_FIRST_RANKS=0
> >    OMPI_MCA_db=^pmi
> >    OMPI_MCA_ess=env
> >    OMPI_MCA_ess_base_jobid=272891905
> >    OMPI_MCA_ess_base_vpid=6
> >    OMPI_MCA_grpcomm=^pmi
> >    OMPI_MCA_hwloc_base_binding_policy=none
> >    OMPI_MCA_initial_wdir=/home/tbiedert/HPX-VolumeRendering/build
> >    OMPI_MCA_mpi_yield_when_idle=0
> >    OMPI_MCA_orte_app_num=0
> >    OMPI_MCA_orte_bound_at_launch=1
> >    OMPI_MCA_orte_ess_jobid=272891904
> >    OMPI_MCA_orte_ess_node_rank=0
> >    OMPI_MCA_orte_ess_num_procs=8
> >    OMPI_MCA_orte_ess_vpid=1
> > OMPI_MCA_orte_hnp_uri=272891904.0;tcp://10.255.8.90,10.250.8.90:48359
> > OMPI_MCA_orte_local_daemon_uri=272891904.6;tcp://10.255.8.74,10.250.8.74:4
> > 7752
> >    OMPI_MCA_orte_num_nodes=8
> >    OMPI_MCA_orte_num_restarts=0
> >    OMPI_MCA_orte_peer_fini_barrier_id=2
> >    OMPI_MCA_orte_peer_init_barrier_id=1
> >    OMPI_MCA_orte_peer_modex_id=0
> > OMPI_MCA_orte_precondition_transports=e2dcd4f3b6aa563f-9fb1cf15b9c08abf
> >    OMPI_MCA_orte_tmpdir_base=/tmp/5324709.tmpdir
> >    OMPI_MCA_pubsub=^pmi
> >    OMPI_MCA_rmaps_base_mapping_policy=ppr:1:node
> >    OMPI_MCA_shmem_RUNTIME_QUERY_hint=mmap
> >    OMPI_NUM_APP_CTX=1
> >    OMPI_UNIVERSE_SIZE=128
> >    OPAL_OUTPUT_STDERR_FD=18
> > PATH=/lsf/9.1/linux2.6-glibc2.3-
> > x86_64/bin:/software/binutils/2.27/bin:/software/gcc/6.2.0/bin:/home/tbied
> > ert/local/bin:/lsf/rhrk/bin:/cluster/rhrk/bin:/usr/lib64/qt-
> > 3.3/bin:/usr/NX/bin:/lsf/9.1/linux2.6-glibc2.3-
> > x86_64/etc:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/o
> > pt/bin:/home/tbiedert/bin
> >    PWD=/home/tbiedert/HPX-VolumeRendering/build
> >    QTDIR=/usr/lib64/qt-3.3
> >    QTINC=/usr/lib64/qt-3.3/include
> >    QTLIB=/usr/lib64/qt-3.3/lib
> >    RBH_CFG_DEFAULT=/cluster/robinhood/conf/scratch.conf
> >    RHRK_MPI_HYBRID=1
> >    RHRK_NOTIFICATION=LOGS
> >    RM_CPUTASK10=3
> >    RM_CPUTASK11=5
> >    RM_CPUTASK12=7
> >    RM_CPUTASK13=9
> >    RM_CPUTASK14=11
> >    RM_CPUTASK15=13
> >    RM_CPUTASK16=15
> >    RM_CPUTASK1=0
> >    RM_CPUTASK2=2
> >    RM_CPUTASK3=4
> >    RM_CPUTASK4=6
> >    RM_CPUTASK5=8
> >    RM_CPUTASK6=10
> >    RM_CPUTASK7=12
> >    RM_CPUTASK8=14
> >    RM_CPUTASK9=1
> >    SBD_KRB5CCNAME_VAL=
> >    SCRATCH=/scratch/tbiedert
> >    SHELL=/bin/bash
> >    SHLVL=4
> >    SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass
> >    SSH_CLIENT=131.246.17.22 35482 22
> >    SSH_CONNECTION=131.246.17.22 35482 131.246.113.228 22
> >    SSH_TTY=/dev/pts/5
> >    TBBROOT=/home/tbiedert/local/opt/tbb2017-update3
> >    TMOUT=3600
> >    TMPDIR=/tmp/5324709.tmpdir
> >    USER=tbiedert
> >    _=/home/tbiedert/local/bin/mpirun
> > _LMFILES_=/software/modulefiles/gcc/6.2.0:/software/modulefiles/binutils/l
> > atest
> >    __LSF_JOB_TMPDIR__=/tmp/5324709.tmpdir
> > {locality-id}: 6
> > {hostname}: [ (mpi:6) ]
> > {process-id}: 79756
> > {function}: input_container::load_binary_chunk
> > {file}: /tmp/hpx-build/hpx/hpx/runtime/serialization/input_container.hpp
> > {line}: 146
> > {os-thread}: worker-thread#11
> > {thread-description}: <unknown>
> > {state}: state_running
> > {auxinfo}:
> > {config}:
> >    HPX_HAVE_NATIVE_TLS=ON
> >    HPX_HAVE_STACKTRACES=ON
> >    HPX_HAVE_COMPRESSION_BZIP2=OFF
> >    HPX_HAVE_COMPRESSION_SNAPPY=OFF
> >    HPX_HAVE_COMPRESSION_ZLIB=OFF
> >    HPX_HAVE_PARCEL_COALESCING=ON
> >    HPX_HAVE_PARCELPORT_TCP=OFF
> >    HPX_HAVE_PARCELPORT_MPI=ON (OpenMPI V1.8.3, MPI V3.0)
> >    HPX_HAVE_VERIFY_LOCKS=OFF
> >    HPX_HAVE_HWLOC=ON
> >    HPX_HAVE_ITTNOTIFY=OFF
> >    HPX_HAVE_RUN_MAIN_EVERYWHERE=OFF
> >    HPX_PARCEL_MAX_CONNECTIONS=512
> >    HPX_PARCEL_MAX_CONNECTIONS_PER_LOCALITY=4
> >    HPX_AGAS_LOCAL_CACHE_SIZE=4096
> >    HPX_HAVE_MALLOC=tcmalloc
> >    HPX_PREFIX (configured)=/home/tbiedert/local
> >    HPX_PREFIX=/home/tbiedert/local
> > {version}: V1.0.0-trunk (AGAS: V3.0), Git: 9ecdb73e07
> > {boost}: V1.62.0
> > {build-type}: release
> > {date}: Dec  7 2016 20:41:41
> > {platform}: linux
> > {compiler}: GNU C++ version 6.2.0
> > {stdlib}: GNU libstdc++ version 20160822
> > {what}: archive data bstream data chunk size mismatch:
> > HPX(serialization_error)
> >
> 
> 
> 
> _______________________________________________
> hpx-users mailing list
> hpx-users@stellar.cct.lsu.edu
> https://mail.cct.lsu.edu/mailman/listinfo/hpx-users

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
hpx-users mailing list
hpx-users@stellar.cct.lsu.edu
https://mail.cct.lsu.edu/mailman/listinfo/hpx-users

Reply via email to