Thanks Hartmut,
I've re-built using the master branch and it's significantly better than it had 
been before.  I am able to successfully run jobs on multiple nodes and with 
multiple localities, although certain invocations still result in a 
segmentation fault.

I haven't had too much time to fully experiment with different combinations -- 
different number of localities, running on certain nodes and not on others.  My 
suspicion is that the crash occurs when running on a particular node, although 
I need to confirm whether or not that is the case.

I get 2 different error messages, although it is not clear to me yet when 
and/or why this happens.  Messages follow:

shmuel@ssh01:~
> srun -n5 -N2 1d_stencil_7
src/tcmalloc.cc:278] Attempt to free invalid pointer 0xfffffffffd658be8
srun: error: hpc02: task 4: Aborted
src/tcmalloc.cc:278] Attempt to free invalid pointer 0xfffffffffeaecc28
src/tcmalloc.cc:278] Attempt to free invalid pointer 0xfffffffffed16c28
src/tcmalloc.cc:278] Attempt to free invalid pointer 0xfffffffffd98ec28
src/tcmalloc.cc:278] Attempt to free invalid pointer 0xfffffffffe6a8c28
srun: error: hpc01: tasks 0-3: Aborted

* * *

shmuel@ssh01:~
> srun -n5 -N2 1d_stencil_7
src/tcmalloc.cc:278] Attempt to free invalid pointer 0xfffffffffe192c28
srun: error: hpc02: task 4: Aborted
{stack-trace}: 13 frames:
0x7f0244b11c19  : hpx::termination_handler(int) + 0x159 in 
/usr/local/lib/libhpx.so.0
0x7f02416828d0  : ??? + 0x7f02416828d0 in /lib/x86_64-linux-gnu/libpthread.so.0
0x7f024578162d  : 
hpx::util::batch_environments::slurm_environment::retrieve_number_of_localities(bool)
 + 0x9fd in /usr/local/lib/libhpx.so.0
0x7f024577f36c  : 
hpx::util::batch_environments::slurm_environment::slurm_environment(std::vector<std::string,
 std::allocator<std::string> >&, bool) + 0x5cc in /usr/local/lib/libhpx.so.0
0x7f02457ea283  : 
hpx::util::batch_environment::batch_environment(std::vector<std::string, 
std::allocator<std::string> >&, hpx::util::runtime_configuration const&, bool, 
bool) + 0xf3 in /usr/local/lib/libhpx.so.0
0x7f0245870486  : ??? + 0x7f0245870486 in /usr/local/lib/libhpx.so.0
0x7f024586ae8d  : ??? + 0x7f024586ae8d in /usr/local/lib/libhpx.so.0
0x7f0244b3c23f  : hpx::detail::run_or_start(hpx::util::function<int 
(boost::program_options::variables_map&), false> const&, 
boost::program_options::options_description const&, int, char**, 
std::vector<std::string, std::allocator<std::string> >&&, 
hpx::util::function<void (), false> const&, hpx::util::function<void (), false> 
const&, hpx::runtime_mode, bool) + 0x25f in /usr/local/lib/libhpx.so.0
0x52d985        : ??? + 0x52d985 in /usr/local/bin/1d_stencil_7
0x417cf8        : ??? + 0x417cf8 in /usr/local/bin/1d_stencil_7
0x7f023f2a4b45  : __libc_start_main + 0xf5 in /lib/x86_64-linux-gnu/libc.so.6
0x417679        : ??? + 0x417679 in /usr/local/bin/1d_stencil_7
{what}: Segmentation fault
{config}:
  HPX_HAVE_NATIVE_TLS=ON
  HPX_HAVE_STACKTRACES=ON
  HPX_HAVE_COMPRESSION_BZIP2=OFF
  HPX_HAVE_COMPRESSION_SNAPPY=ON
  HPX_HAVE_COMPRESSION_ZLIB=OFF
  HPX_HAVE_PARCEL_COALESCING=ON
  HPX_HAVE_PARCELPORT_TCP=ON
  HPX_HAVE_PARCELPORT_MPI=ON (MPICH V3.1.2, MPI V3.0)
  HPX_HAVE_PARCELPORT_IPC=OFF
  HPX_HAVE_PARCELPORT_IBVERBS=OFF
  HPX_HAVE_VERIFY_LOCKS=OFF
  HPX_HAVE_HWLOC=ON
  HPX_HAVE_ITTNOTIFY=OFF
  HPX_HAVE_RUN_MAIN_EVERYWHERE=OFF
  HPX_PARCEL_MAX_CONNECTIONS=512
  HPX_PARCEL_MAX_CONNECTIONS_PER_LOCALITY=4
  HPX_INITIAL_AGAS_LOCAL_CACHE_SIZE=256
  HPX_AGAS_LOCAL_CACHE_SIZE_PER_THREAD=32
  HPX_HAVE_MALLOC=tcmalloc
  HPX_PREFIX (configured)=/usr/local
  HPX_PREFIX=/usr/local
{version}: V0.9.12-trunk (AGAS: V3.0), Git: 1b4fbdd2ef
{boost}: V1.60.0
{build-type}: release
{date}: Jan  9 2016 19:24:29
{platform}: linux
{compiler}: Intel C++ C++0x mode version 1600
{stdlib}: GNU libstdc++ version 20141220
{stack-trace}: 13 frames:
0x7f994560ec19  : hpx::termination_handler(int) + 0x159 in 
/usr/local/lib/libhpx.so.0
0x7f994217f8d0  : ??? + 0x7f994217f8d0 in /lib/x86_64-linux-gnu/libpthread.so.0
0x7f994627e62d  : 
hpx::util::batch_environments::slurm_environment::retrieve_number_of_localities(bool)
 + 0x9fd in /usr/local/lib/libhpx.so.0
0x7f994627c36c  : 
hpx::util::batch_environments::slurm_environment::slurm_environment(std::vector<std::string,
 std::allocator<std::string> >&, bool) + 0x5cc in /usr/local/lib/libhpx.so.0
0x7f99462e7283  : 
hpx::util::batch_environment::batch_environment(std::vector<std::string, 
std::allocator<std::string> >&, hpx::util::runtime_configuration const&, bool, 
bool) + 0xf3 in /usr/local/lib/libhpx.so.0
0x7f994636d486  : ??? + 0x7f994636d486 in /usr/local/lib/libhpx.so.0
0x7f9946367e8d  : ??? + 0x7f9946367e8d in /usr/local/lib/libhpx.so.0
0x7f994563923f  : hpx::detail::run_or_start(hpx::util::function<int 
(boost::program_options::variables_map&), false> const&, 
boost::program_options::options_description const&, int, char**, 
std::vector<std::string, std::allocator<std::string> >&&, 
hpx::util::function<void (), false> const&, hpx::util::function<void (), false> 
const&, hpx::runtime_mode, bool) + 0x25f in /usr/local/lib/libhpx.so.0
0x52d985        : ??? + 0x52d985 in /usr/local/bin/1d_stencil_7
0x417cf8        : ??? + 0x417cf8 in /usr/local/bin/1d_stencil_7
0x7f993fda1b45  : __libc_start_main + 0xf5 in /lib/x86_64-linux-gnu/libc.so.6
0x417679        : ??? + 0x417679 in /usr/local/bin/1d_stencil_7
{what}: Segmentation fault
{config}:
  HPX_HAVE_NATIVE_TLS=ON
  HPX_HAVE_STACKTRACES=ON
  HPX_HAVE_COMPRESSION_BZIP2=OFF
  HPX_HAVE_COMPRESSION_SNAPPY=ON
  HPX_HAVE_COMPRESSION_ZLIB=OFF
  HPX_HAVE_PARCEL_COALESCING=ON
  HPX_HAVE_PARCELPORT_TCP=ON
  HPX_HAVE_PARCELPORT_MPI=ON (MPICH V3.1.2, MPI V3.0)
  HPX_HAVE_PARCELPORT_IPC=OFF
  HPX_HAVE_PARCELPORT_IBVERBS=OFF
  HPX_HAVE_VERIFY_LOCKS=OFF
  HPX_HAVE_HWLOC=ON
  HPX_HAVE_ITTNOTIFY=OFF
  HPX_HAVE_RUN_MAIN_EVERYWHERE=OFF
  HPX_PARCEL_MAX_CONNECTIONS=512
  HPX_PARCEL_MAX_CONNECTIONS_PER_LOCALITY=4
  HPX_INITIAL_AGAS_LOCAL_CACHE_SIZE=256
  HPX_AGAS_LOCAL_CACHE_SIZE_PER_THREAD=32
  HPX_HAVE_MALLOC=tcmalloc
  HPX_PREFIX (configured)=/usr/local
  HPX_PREFIX=/usr/local
{version}: V0.9.12-trunk (AGAS: V3.0), Git: 1b4fbdd2ef
{boost}: V1.60.0
{build-type}: release
{date}: Jan  9 2016 19:24:29
{platform}: linux
{compiler}: Intel C++ C++0x mode version 1600
{stdlib}: GNU libstdc++ version 20141220
{stack-trace}: {stack-trace}: 13 frames:
0x7f6780e35c19  : hpx::termination_handler(int) + 0x159 in 
/usr/local/lib/libhpx.so.0
0x7f677d9a68d0  : ??? + 0x7f677d9a68d0 in /lib/x86_64-linux-gnu/libpthread.so.0
0x7f6781aa562d  : 
hpx::util::batch_environments::slurm_environment::retrieve_number_of_localities(bool)
 + 0x9fd in /usr/local/lib/libhpx.so.0
0x7f6781aa336c  : 
hpx::util::batch_environments::slurm_environment::slurm_environment(std::vector<std::string,
 std::allocator<std::string> >&, bool) + 0x5cc in /usr/local/lib/libhpx.so.0
0x7f6781b0e283  : 
hpx::util::batch_environment::batch_environment(std::vector<std::string, 
std::allocator<std::string> >&, hpx::util::runtime_configuration const&, bool, 
bool) + 0xf3 in /usr/local/lib/libhpx.so.0
0x7f6781b94486  : ??? + 0x7f6781b94486 in /usr/local/lib/libhpx.so.0
0x7f6781b8ee8d  : ??? + 0x7f6781b8ee8d in /usr/local/lib/libhpx.so.0
0x7f6780e6023f  : hpx::detail::run_or_start(hpx::util::function<int 
(boost::program_options::variables_map&), false> const&, 
boost::program_options::options_description const&, int, char**, 
std::vector<std::string, std::allocator<std::string> >&&, 
hpx::util::function<void (), false> const&, hpx::util::function<void (), false> 
const&, hpx::runtime_mode, bool) + 0x25f in /usr/local/lib/libhpx.so.0
0x52d985        : ??? + 0x52d985 in /usr/local/bin/1d_stencil_7
0x417cf8        : ??? + 0x417cf8 in /usr/local/bin/1d_stencil_7
0x7f677b5c8b45  : __libc_start_main + 0xf5 in /lib/x86_64-linux-gnu/libc.so.6
0x417679        : ??? + 0x417679 in /usr/local/bin/1d_stencil_7
{what}: Segmentation fault
{config}:
  HPX_HAVE_NATIVE_TLS=ON
  HPX_HAVE_STACKTRACES=ON
  HPX_HAVE_COMPRESSION_BZIP2=OFF
  HPX_HAVE_COMPRESSION_SNAPPY=ON
  HPX_HAVE_COMPRESSION_ZLIB=OFF
  HPX_HAVE_PARCEL_COALESCING=ON
  HPX_HAVE_PARCELPORT_TCP=ON
  HPX_HAVE_PARCELPORT_MPI=ON (MPICH V3.1.2, MPI V3.0)
  HPX_HAVE_PARCELPORT_IPC=OFF
  HPX_HAVE_PARCELPORT_IBVERBS=OFF
  HPX_HAVE_VERIFY_LOCKS=OFF
  HPX_HAVE_HWLOC=ON
  HPX_HAVE_ITTNOTIFY=OFF
  HPX_HAVE_RUN_MAIN_EVERYWHERE=OFF
  HPX_PARCEL_MAX_CONNECTIONS=512
  HPX_PARCEL_MAX_CONNECTIONS_PER_LOCALITY=4
  HPX_INITIAL_AGAS_LOCAL_CACHE_SIZE=256
  HPX_AGAS_LOCAL_CACHE_SIZE_PER_THREAD=32
  HPX_HAVE_MALLOC=tcmalloc
  HPX_PREFIX (configured)=/usr/local
  HPX_PREFIX=/usr/local
{version}: V0.9.12-trunk (AGAS: V3.0), Git: 1b4fbdd2ef
{boost}: V1.60.0
{build-type}: release
{date}: Jan  9 2016 19:24:29
{platform}: linux
{compiler}: Intel C++ C++0x mode version 1600
{stdlib}: GNU libstdc++ version 20141220
13 frames:
0x7f2322b58c19  : hpx::termination_handler(int) + 0x159 in 
/usr/local/lib/libhpx.so.0
0x7f231f6c98d0  : ??? + 0x7f231f6c98d0 in /lib/x86_64-linux-gnu/libpthread.so.0
0x7f23237c862d  : 
hpx::util::batch_environments::slurm_environment::retrieve_number_of_localities(bool)
 + 0x9fd in /usr/local/lib/libhpx.so.0
0x7f23237c636c  : 
hpx::util::batch_environments::slurm_environment::slurm_environment(std::vector<std::string,
 std::allocator<std::string> >&, bool) + 0x5cc in /usr/local/lib/libhpx.so.0
0x7f2323831283  : 
hpx::util::batch_environment::batch_environment(std::vector<std::string, 
std::allocator<std::string> >&, hpx::util::runtime_configuration const&, bool, 
bool) + 0xf3 in /usr/local/lib/libhpx.so.0
0x7f23238b7486  : ??? + 0x7f23238b7486 in /usr/local/lib/libhpx.so.0
0x7f23238b1e8d  : ??? + 0x7f23238b1e8d in /usr/local/lib/libhpx.so.0
0x7f2322b8323f  : hpx::detail::run_or_start(hpx::util::function<int 
(boost::program_options::variables_map&), false> const&, 
boost::program_options::options_description const&, int, char**, 
std::vector<std::string, std::allocator<std::string> >&&, 
hpx::util::function<void (), false> const&, hpx::util::function<void (), false> 
const&, hpx::runtime_mode, bool) + 0x25f in /usr/local/lib/libhpx.so.0
0x52d985        : ??? + 0x52d985 in /usr/local/bin/1d_stencil_7
0x417cf8        : ??? + 0x417cf8 in /usr/local/bin/1d_stencil_7
0x7f231d2ebb45  : __libc_start_main + 0xf5 in /lib/x86_64-linux-gnu/libc.so.6
0x417679        : ??? + 0x417679 in /usr/local/bin/1d_stencil_7
{what}: Segmentation fault
{config}:
  HPX_HAVE_NATIVE_TLS=ON
  HPX_HAVE_STACKTRACES=ON
  HPX_HAVE_COMPRESSION_BZIP2=OFF
  HPX_HAVE_COMPRESSION_SNAPPY=ON
  HPX_HAVE_COMPRESSION_ZLIB=OFF
  HPX_HAVE_PARCEL_COALESCING=ON
  HPX_HAVE_PARCELPORT_TCP=ON
  HPX_HAVE_PARCELPORT_MPI=ON (MPICH V3.1.2, MPI V3.0)
  HPX_HAVE_PARCELPORT_IPC=OFF
  HPX_HAVE_PARCELPORT_IBVERBS=OFF
  HPX_HAVE_VERIFY_LOCKS=OFF
  HPX_HAVE_HWLOC=ON
  HPX_HAVE_ITTNOTIFY=OFF
  HPX_HAVE_RUN_MAIN_EVERYWHERE=OFF
  HPX_PARCEL_MAX_CONNECTIONS=512
  HPX_PARCEL_MAX_CONNECTIONS_PER_LOCALITY=4
  HPX_INITIAL_AGAS_LOCAL_CACHE_SIZE=256
  HPX_AGAS_LOCAL_CACHE_SIZE_PER_THREAD=32
  HPX_HAVE_MALLOC=tcmalloc
  HPX_PREFIX (configured)=/usr/local
  HPX_PREFIX=/usr/local
{version}: V0.9.12-trunk (AGAS: V3.0), Git: 1b4fbdd2ef
{boost}: V1.60.0
{build-type}: release
{date}: Jan  9 2016 19:24:29
{platform}: linux
{compiler}: Intel C++ C++0x mode version 1600
{stdlib}: GNU libstdc++ version 20141220
srun: error: hpc01: tasks 0-3: Aborted

* * *


Further to that point, could you please help me to understand how to attach and 
use a debugger with the code? 

shmuel@ssh01:~
> srun -n6 1d_stencil_7 --hpx:attach-debugger
PID: 19307 on ssh01.thelevines.ca ready for attaching debugger. Once attached 
set i = 1 and continue
PID: 19305 on ssh01.thelevines.ca ready for attaching debugger. Once attached 
set i = 1 and continue
PID: 19310 on ssh01.thelevines.ca ready for attaching debugger. Once attached 
set i = 1 and continue
PID: 19306 on ssh01.thelevines.ca ready for attaching debugger. Once attached 
set i = 1 and continue
PID: 19309 on ssh01.thelevines.ca ready for attaching debugger. Once attached 
set i = 1 and continue
PID: 19308 on ssh01.thelevines.ca ready for attaching debugger. Once attached 
set i = 1 and continue

I can attach gdb to any of the above processes, but there is no variable 'I' 
and I can't figure out how to get the code to continue along.  Sorry if this is 
a stupid/obvious question, but I'm a little stuck and cannot figure out how to 
move along with it.

Thanks again for all your assistance,
Michael


_______________________________________________
hpx-users mailing list
[email protected]
https://mail.cct.lsu.edu/mailman/listinfo/hpx-users

Reply via email to