Good Afternoon Tony: Your 20 GiB database is inefficient and too large to function. Break up your database into about 10 pieces, 2 GiB each and run your queries against those 10 pieces.
--Hiram On 7/14/12 3:54 PM, Tony Travis wrote: > Hi, Galt and John. > > I've just read this thread from a year ago on the Genome list, when > searching for messages about BLAT segfaults: I'm having similar problems > to John, using a 20GiB BLAT DB and I get a segfault at the same place as > John, on different hardware and using different data: > > >> genoFind.c:1359 > >> 1359 slAddHead(pb, hit); > > What is worrying, is that the "blat" process terminated normally the > third time I ran it under "gdb". I've run memory diagnostics on the > server (2156GiB RAM) and no errors were reported. Has this problem been > resolved since the message was posted a year ago? > > Bye, > > Tony. > >> Hi, John! >> >> I don't know of any size-limit such as you mention, >> but our databases are usually around 3 to 5GB and not 36GB. >> >> We will pass along your message to Jim Kent, the author of BLAT, >> and probably be getting back to you off-list. >> >> -Galt >> >> On 06/29/11 08:22, John Legato wrote: >>> >>> We're experiencing a segmentation fault running blat on an x86_64 box >>> with 128GB of RAM. We are running against NCBI's nt database (in FASTA) >>> form. We are using the following query sequences for testing (testst.fa >>> in the example below): >>> >>> >>> >Test1 >>> ATCTCTACATCCGCCCACTCCCAAATCCGTTTTGTGCAACCAACCTCTATT >>> >Test2 >>> CCCCCACAGCAGCAGGAATAATCAAGGGGATGACAGGAAGAGNNNNNNNNN >>> >Test3 >>> AAGTAACCTAGACCTTAAAATTGTACATAGCCTCTCCGAGGANNNNNNNNN >>> >Test4 >>> TTCAAACTTAAGGAATGTAGTGTTGCGATGGGTACTCAACTGATCCCANTT >>> >Test5 >>> AGATGTGGTTCCACCCATAACTCAAGGGCAGATAGGAAACACCNNNNNNNN >>> >Test6 >>> AGGCAACCCCCGGCAGGATCATTCCAGGCACCGTGGGTTTCANNNNNNNNN >>> >Test7 >>> TCTTAGTGTTGAGTCAGACGCAAAGTTGAGACAGGGGAAAAGGCNNNNNNN >>> >Test8 >>> CTTCTACATGTTGGCTGCCAGTTAAACCAGCACCATTTGTTGCAAATGCTA >>> >Test9 >>> CCTCACTAACACAAATGTTGGAGGAAGTCTTGGGAGGCATCCTATTGATAC >>> >Test10 >>> TTTGTGTTCTGGGGCAGCTGGCTTTAGAAAGAGAACTCCAGGTCAANNNNG >>> >>> >>> We've recompiled blat 34 from source with -g, gdb reports the following >>> when we do a back trace: >>> >>> >>> (gdb) set args nt.fa testst.fa testout1.psl -out=blast >>> (gdb) r >>> Starting program: /v/server1a/jlegato/bin/x86_64/blat nt.fa testst.fa >>> testout1.psl -out=blast >>> Loaded 36318681436 letters in 14096376 sequences >>> >>> Program received signal SIGSEGV, Segmentation fault. >>> gfFindClumpsWithQmask (gf=0x82f028d50, seq=<value optimized out>, >>> qMaskBits=<value optimized out>, qMaskOffset=<value optimized out>, >>> lm=<value optimized out>, retHitCount=<value optimized out>) at >>> genoFind.c:1359 >>> 1359 slAddHead(pb, hit); > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > *** My debugging session shows a segfault at the same point as John's: > >> atravis@bifx-cli:~/work/BLAT$ gdb /homes/atravis/bin/blat >> GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2) 7.4-2012.04 >> Copyright (C) 2012 Free Software Foundation, Inc. >> License GPLv3+: GNU GPL version 3 or later<http://gnu.org/licenses/gpl.html> >> This is free software: you are free to change and redistribute it. >> There is NO WARRANTY, to the extent permitted by law. Type "show copying" >> and "show warranty" for details. >> This GDB was configured as "x86_64-linux-gnu". >> For bug reporting instructions, please see: >> <http://bugs.launchpad.net/gdb-linaro/>... >> Reading symbols from /homes/atravis/bin/blat...done. >> (gdb) run -out=blast8 /data1/human/GRCh37/GRCh37.fof >> NNNTCTCTAGC_FIBfl_comp.fasta NNNTCTCTAGC_FIBfl_comp.blat >> Starting program: /homes/atravis/bin/blat -out=blast8 >> /data1/human/GRCh37/GRCh37.fof NNNTCTCTAGC_FIBfl_comp.fasta >> NNNTCTCTAGC_FIBfl_comp.blat >> Loaded 21439866084 letters in 223 sequences >> >> Program received signal SIGSEGV, Segmentation fault. >> 0x000000000040a1de in clumpHits (gf=0xa45dd0, hitList=0x3ad1c88, minMatch=2) >> at genoFind.c:1359 >> 1359 slAddHead(pb, hit); > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > >> (gdb) where >> #0 0x000000000040a1de in clumpHits (gf=0xa45dd0, hitList=0x3ad1c88, >> minMatch=2) at genoFind.c:1359 >> #1 0x000000000040b365 in gfFindClumpsWithQmask (gf=0xa45dd0, >> seq=0x7fffffffddb0, qMaskBits=0x0, qMaskOffset=0, lm=0x4667a0, >> retHitCount=0x7fffffffde20) at genoFind.c:1866 >> #2 0x00000000004107ed in gfLongDnaInMem (query=0x7fffffffdf10, gf=0xa45dd0, >> isRc=0, minScore=30, qMaskBits=0x0, out=0xa45d70, >> fastMap=0, band=0) at gfBlatLib.c:1530 >> #3 0x00000000004028ba in searchOneStrand (seq=0x7fffffffdf10, gf=0xa45dd0, >> psl=0x466510, isRc=0, maskHash=0x0, qMaskBits=0x0) >> at blat.c:200 >> #4 0x0000000000402a19 in searchOne (seq=0x7fffffffdf10, gf=0xa45dd0, >> f=0x466510, isProt=0, maskHash=0x0, qMaskBits=0x0) >> at blat.c:241 >> #5 0x0000000000402d04 in searchOneMaskTrim (seq=0x461880, isProt=0, >> gf=0xa45dd0, outFile=0x466510, maskHash=0x0, >> retTotalSize=0x7fffffffdfa0, retCount=0x7fffffffdfd8) at blat.c:310 >> #6 0x0000000000402ffe in searchOneIndex (fileCount=1, files=0x466750, >> gf=0xa45dd0, >> outName=0x7fffffffe4bc "NNNTCTCTAGC_FIBfl_comp.blat", isProt=0, >> maskHash=0x0, outFile=0x466510, showStatus=1) at blat.c:380 >> #7 0x0000000000403a98 in blat (dbFile=0x7fffffffe480 >> "/data1/human/GRCh37/GRCh37.fof", >> queryFile=0x7fffffffe49f "NNNTCTCTAGC_FIBfl_comp.fasta", >> outName=0x7fffffffe4bc "NNNTCTCTAGC_FIBfl_comp.blat") at blat.c:606 >> #8 0x00000000004041c4 in main (argc=4, argv=0x7fffffffe1b8) at blat.c:783 >> (gdb) print pb >> $1 = (struct gfHit **) 0x3b553f0 >> (gdb) print hit >> $2 = (struct gfHit *) 0x3a71a80 >> (gdb) run -out=blast8 /data1/human/GRCh37/GRCh37.fof >> NNNTCTCTAGC_FIBfl_comp.fasta NNNTCTCTAGC_FIBfl_comp.blat >> The program being debugged has been started already. >> Start it from the beginning? (y or n) y >> >> Starting program: /homes/atravis/bin/blat -out=blast8 >> /data1/human/GRCh37/GRCh37.fof NNNTCTCTAGC_FIBfl_comp.fasta >> NNNTCTCTAGC_FIBfl_comp.blat >> Loaded 21439866084 letters in 223 sequences >> >> Program received signal SIGSEGV, Segmentation fault. >> 0x000000000040a1de in clumpHits (gf=0xa45dd0, hitList=0x3ad1c88, minMatch=2) >> at genoFind.c:1359 >> 1359 slAddHead(pb, hit); >> (gdb) q >> A debugging session is active. > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
