On 02/12/13 07:32 PM, Jeff Tan wrote: > Hi all, > > We're seeing this seg fault occur in Vertex.cpp, it seems, using Ray 2.3.0 on > x86 launched via Slurm:
Hi, I checked the stack you provided and the fault occurs during the loading of some checkpoints (RAY_MPI_TAG_START_SEEDING). m_readsStartingHere is a linked list and a value of 0 means that it's empty. You can build Ray with ASSERT=y to help in debugging. Aisde from that, I don't see the issue with the available information. > > [jtan@barcoo-m barcoo]$ grep 27960 slurm-349755.out > Rank 233: Rank= 233 Size= 512 ProcessIdentifier= 27960 > [barcoo050:27960] *** Process received signal *** > [barcoo050:27960] Signal: Segmentation fault (11) > [barcoo050:27960] Signal code: Address not mapped (1) > [barcoo050:27960] Failing at address: 0x18 > [barcoo050:27960] [ 0] /lib64/libpthread.so.0(+0xf500) [0x2b076d782500] > [barcoo050:27960] [ 1] > /usr/local/Ray/2.3.0-gcc/Ray(_ZN6Vertex7addReadEP4KmerP14ReadAnnotation+0xe) > [0x5d18ee] > [barcoo050:27960] [ 2] > /usr/local/Ray/2.3.0-gcc/Ray(_ZN33Adapter_RAY_MPI_TAG_START_SEEDING4callEP7Message+0x481) > [0x4cdf11] > [barcoo050:27960] [ 3] > /usr/local/Ray/2.3.0-gcc/Ray(_ZN18MessageTagExecutor11callHandlerEiP7Message+0x22) > [0x62cca2] > [barcoo050:27960] [ 4] > /usr/local/Ray/2.3.0-gcc/Ray(_ZN11ComputeCore15runWithProfilerEv+0x1105) > [0x60fb75] > [barcoo050:27960] [ 5] > /usr/local/Ray/2.3.0-gcc/Ray(_ZN11ComputeCore3runEv+0x28e) [0x60d33e] > [barcoo050:27960] [ 6] > /usr/local/Ray/2.3.0-gcc/Ray(_ZN7Machine5startEv+0x2024) [0x475544] > [barcoo050:27960] [ 7] /usr/local/Ray/2.3.0-gcc/Ray(_ZN7Machine3runEv+0x6) > [0x473516] > [barcoo050:27960] [ 8] /usr/local/Ray/2.3.0-gcc/Ray(main+0x2d7) [0x470c47] > [barcoo050:27960] [ 9] /lib64/libc.so.6(__libc_start_main+0xfd) > [0x2b076d9aecdd] > [barcoo050:27960] [10] /usr/local/Ray/2.3.0-gcc/Ray() [0x4708a9] > > [jtan@barcoo-m barcoo]$ gdb -d ~/src/Ray-2.3.0 -c core.27960 `which Ray` > GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1) > Copyright (C) 2010 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later > <http://gnu.org/licenses/gpl.html> > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > and "show warranty" for details. > This GDB was configured as "x86_64-redhat-linux-gnu". > For bug reporting instructions, please see: > <http://www.gnu.org/software/gdb/bugs/>... > Reading symbols from /usr/local/Ray/2.3.0-gcc/Ray...done. > .... > Core was generated by `/usr/local/Ray/2.3.0-gcc/Ray BarcooRay31.conf'. > Program terminated with signal 11, Segmentation fault. > #0 0x00000000005d18ee in Vertex::addRead (this=0x2b079cfae040, > vertex=0x7fffadade1b0, e=0x2b079cfae040) > at code/VerticesExtractor/Vertex.cpp:176 > 176 e->setNext(m_readsStartingHere); > ... > > with the source code in ~/src/Ray-2.3.0. > > and I find: > > (gdb) where full > #0 0x00000000005d18ee in Vertex::addRead (this=0x2b079cfae040, > vertex=0x7fffadade1b0, e=0x2b079cfae040) > at code/VerticesExtractor/Vertex.cpp:176 > ... > (gdb) print m_readsStartingHere > $1 = (ReadAnnotation *) 0x0 > > > > With the user's permission, I have attached the configuration file, but not > the 800MB core dump. :-) > > Does anyone have any experience with this sort of problem? Maybe suggestions > on how to debug this further? > > Regards > > Jeff Tan > High Performance Computing Specialist > IBM Research Collaboratory for Life Sciences, Melbourne > > ------------------------------------------------------------------------------ CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments & Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk _______________________________________________ Denovoassembler-users mailing list Denovoassembler-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/denovoassembler-users