Thanks for that! We'll give the extra debugging a try and see how it
goes.

Regards
Jeff

On Thu, 2014-01-09 at 15:57 -0500, Sébastien Boisvert wrote:
> On 02/12/13 07:32 PM, Jeff Tan wrote:
> > Hi all,
> >
> > We're seeing this seg fault occur in Vertex.cpp, it seems, using Ray 2.3.0 
> > on x86 launched via Slurm:
> 
> Hi,
> 
> I checked the stack you provided and the fault occurs during the loading of
> some checkpoints (RAY_MPI_TAG_START_SEEDING).
> 
> m_readsStartingHere is a linked list and a value of 0 means that it's empty.
> 
> You can build Ray with ASSERT=y to help in debugging. Aisde from that, I don't
> see the issue with the available information.
> 
> 
> >
> > [jtan@barcoo-m barcoo]$ grep 27960 slurm-349755.out
> > Rank 233: Rank= 233 Size= 512 ProcessIdentifier= 27960
> > [barcoo050:27960] *** Process received signal ***
> > [barcoo050:27960] Signal: Segmentation fault (11)
> > [barcoo050:27960] Signal code: Address not mapped (1)
> > [barcoo050:27960] Failing at address: 0x18
> > [barcoo050:27960] [ 0] /lib64/libpthread.so.0(+0xf500) [0x2b076d782500]
> > [barcoo050:27960] [ 1] 
> > /usr/local/Ray/2.3.0-gcc/Ray(_ZN6Vertex7addReadEP4KmerP14ReadAnnotation+0xe)
> >  [0x5d18ee]
> > [barcoo050:27960] [ 2] 
> > /usr/local/Ray/2.3.0-gcc/Ray(_ZN33Adapter_RAY_MPI_TAG_START_SEEDING4callEP7Message+0x481)
> >  [0x4cdf11]
> > [barcoo050:27960] [ 3] 
> > /usr/local/Ray/2.3.0-gcc/Ray(_ZN18MessageTagExecutor11callHandlerEiP7Message+0x22)
> >  [0x62cca2]
> > [barcoo050:27960] [ 4] 
> > /usr/local/Ray/2.3.0-gcc/Ray(_ZN11ComputeCore15runWithProfilerEv+0x1105) 
> > [0x60fb75]
> > [barcoo050:27960] [ 5] 
> > /usr/local/Ray/2.3.0-gcc/Ray(_ZN11ComputeCore3runEv+0x28e) [0x60d33e]
> > [barcoo050:27960] [ 6] 
> > /usr/local/Ray/2.3.0-gcc/Ray(_ZN7Machine5startEv+0x2024) [0x475544]
> > [barcoo050:27960] [ 7] /usr/local/Ray/2.3.0-gcc/Ray(_ZN7Machine3runEv+0x6) 
> > [0x473516]
> > [barcoo050:27960] [ 8] /usr/local/Ray/2.3.0-gcc/Ray(main+0x2d7) [0x470c47]
> > [barcoo050:27960] [ 9] /lib64/libc.so.6(__libc_start_main+0xfd) 
> > [0x2b076d9aecdd]
> > [barcoo050:27960] [10] /usr/local/Ray/2.3.0-gcc/Ray() [0x4708a9]
> >
> > [jtan@barcoo-m barcoo]$ gdb -d ~/src/Ray-2.3.0 -c core.27960 `which Ray`
> > GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1)
> > Copyright (C) 2010 Free Software Foundation, Inc.
> > License GPLv3+: GNU GPL version 3 or later
> > <http://gnu.org/licenses/gpl.html>
> > This is free software: you are free to change and redistribute it.
> > There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> > and "show warranty" for details.
> > This GDB was configured as "x86_64-redhat-linux-gnu".
> > For bug reporting instructions, please see:
> > <http://www.gnu.org/software/gdb/bugs/>...
> > Reading symbols from /usr/local/Ray/2.3.0-gcc/Ray...done.
> > ....
> > Core was generated by `/usr/local/Ray/2.3.0-gcc/Ray BarcooRay31.conf'.
> > Program terminated with signal 11, Segmentation fault.
> > #0  0x00000000005d18ee in Vertex::addRead (this=0x2b079cfae040,
> > vertex=0x7fffadade1b0, e=0x2b079cfae040)
> >      at code/VerticesExtractor/Vertex.cpp:176
> > 176             e->setNext(m_readsStartingHere);
> > ...
> >
> > with the source code in ~/src/Ray-2.3.0.
> >
> > and I find:
> >
> > (gdb) where full
> > #0  0x00000000005d18ee in Vertex::addRead (this=0x2b079cfae040,
> > vertex=0x7fffadade1b0, e=0x2b079cfae040)
> >      at code/VerticesExtractor/Vertex.cpp:176
> > ...
> > (gdb) print m_readsStartingHere
> > $1 = (ReadAnnotation *) 0x0
> >
> >
> >
> > With the user's permission, I have attached the configuration file, but not 
> > the 800MB core dump. :-)
> >
> > Does anyone have any experience with this sort of problem? Maybe 
> > suggestions on how to debug this further?
> >
> > Regards
> >
> > Jeff Tan
> > High Performance Computing Specialist
> > IBM Research Collaboratory for Life Sciences, Melbourne
> >
> >
> 
> 

-- 
Jeff Tan
High Performance Computing Specialist
IBM Research Collaboratory for Life Sciences, Melbourne
Phone:  +61 3 9035 4392


------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Denovoassembler-users mailing list
Denovoassembler-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users

Reply via email to