Good Afternoon Tony:

Your 20 GiB database is inefficient and too large to function.
Break up your database into about 10 pieces, 2 GiB each and run
your queries against those 10 pieces.

--Hiram

On 7/14/12 3:54 PM, Tony Travis wrote:
> Hi, Galt and John.
>
> I've just read this thread from a year ago on the Genome list, when
> searching for messages about BLAT segfaults: I'm having similar problems
> to John, using a 20GiB BLAT DB and I get a segfault at the same place as
> John, on different hardware and using different data:
>
>     >>  genoFind.c:1359
>     >>  1359        slAddHead(pb, hit);
>
> What is worrying, is that the "blat" process terminated normally the
> third time I ran it under "gdb". I've run memory diagnostics on the
> server (2156GiB RAM) and no errors were reported. Has this problem been
> resolved since the message was posted a year ago?
>
> Bye,
>
>     Tony.
>
>> Hi, John!
>>
>> I don't know of any size-limit such as you mention,
>> but our databases are usually around 3 to 5GB and not 36GB.
>>
>> We will pass along your message to Jim Kent, the author of BLAT,
>> and probably be getting back to you off-list.
>>
>> -Galt
>>
>> On 06/29/11 08:22, John Legato wrote:
>>>
>>> We're experiencing a segmentation fault running blat on an x86_64 box
>>> with 128GB of RAM. We are running against NCBI's nt database (in FASTA)
>>> form. We are using the following query sequences for testing (testst.fa
>>> in the example below):
>>>
>>>
>>>   >Test1
>>> ATCTCTACATCCGCCCACTCCCAAATCCGTTTTGTGCAACCAACCTCTATT
>>>   >Test2
>>> CCCCCACAGCAGCAGGAATAATCAAGGGGATGACAGGAAGAGNNNNNNNNN
>>>   >Test3
>>> AAGTAACCTAGACCTTAAAATTGTACATAGCCTCTCCGAGGANNNNNNNNN
>>>   >Test4
>>> TTCAAACTTAAGGAATGTAGTGTTGCGATGGGTACTCAACTGATCCCANTT
>>>   >Test5
>>> AGATGTGGTTCCACCCATAACTCAAGGGCAGATAGGAAACACCNNNNNNNN
>>>   >Test6
>>> AGGCAACCCCCGGCAGGATCATTCCAGGCACCGTGGGTTTCANNNNNNNNN
>>>   >Test7
>>> TCTTAGTGTTGAGTCAGACGCAAAGTTGAGACAGGGGAAAAGGCNNNNNNN
>>>   >Test8
>>> CTTCTACATGTTGGCTGCCAGTTAAACCAGCACCATTTGTTGCAAATGCTA
>>>   >Test9
>>> CCTCACTAACACAAATGTTGGAGGAAGTCTTGGGAGGCATCCTATTGATAC
>>>   >Test10
>>> TTTGTGTTCTGGGGCAGCTGGCTTTAGAAAGAGAACTCCAGGTCAANNNNG
>>>
>>>
>>> We've recompiled blat 34 from source with -g, gdb reports the following
>>> when we do a back trace:
>>>
>>>
>>> (gdb) set args  nt.fa testst.fa testout1.psl -out=blast
>>> (gdb) r
>>> Starting program: /v/server1a/jlegato/bin/x86_64/blat nt.fa testst.fa
>>> testout1.psl -out=blast
>>> Loaded 36318681436 letters in 14096376 sequences
>>>
>>> Program received signal SIGSEGV, Segmentation fault.
>>> gfFindClumpsWithQmask (gf=0x82f028d50, seq=<value optimized out>,
>>> qMaskBits=<value optimized out>, qMaskOffset=<value optimized out>,
>>> lm=<value optimized out>, retHitCount=<value optimized out>) at
>>> genoFind.c:1359
>>> 1359        slAddHead(pb, hit);
>      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> *** My debugging session shows a segfault at the same point as John's:
>
>> atravis@bifx-cli:~/work/BLAT$ gdb /homes/atravis/bin/blat
>> GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2) 7.4-2012.04
>> Copyright (C) 2012 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later<http://gnu.org/licenses/gpl.html>
>> This is free software: you are free to change and redistribute it.
>> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
>> and "show warranty" for details.
>> This GDB was configured as "x86_64-linux-gnu".
>> For bug reporting instructions, please see:
>> <http://bugs.launchpad.net/gdb-linaro/>...
>> Reading symbols from /homes/atravis/bin/blat...done.
>> (gdb) run -out=blast8 /data1/human/GRCh37/GRCh37.fof 
>> NNNTCTCTAGC_FIBfl_comp.fasta NNNTCTCTAGC_FIBfl_comp.blat
>> Starting program: /homes/atravis/bin/blat -out=blast8 
>> /data1/human/GRCh37/GRCh37.fof NNNTCTCTAGC_FIBfl_comp.fasta 
>> NNNTCTCTAGC_FIBfl_comp.blat
>> Loaded 21439866084 letters in 223 sequences
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> 0x000000000040a1de in clumpHits (gf=0xa45dd0, hitList=0x3ad1c88, minMatch=2) 
>> at genoFind.c:1359
>> 1359     slAddHead(pb, hit);
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
>> (gdb) where
>> #0  0x000000000040a1de in clumpHits (gf=0xa45dd0, hitList=0x3ad1c88, 
>> minMatch=2) at genoFind.c:1359
>> #1  0x000000000040b365 in gfFindClumpsWithQmask (gf=0xa45dd0, 
>> seq=0x7fffffffddb0, qMaskBits=0x0, qMaskOffset=0, lm=0x4667a0,
>>      retHitCount=0x7fffffffde20) at genoFind.c:1866
>> #2  0x00000000004107ed in gfLongDnaInMem (query=0x7fffffffdf10, gf=0xa45dd0, 
>> isRc=0, minScore=30, qMaskBits=0x0, out=0xa45d70,
>>      fastMap=0, band=0) at gfBlatLib.c:1530
>> #3  0x00000000004028ba in searchOneStrand (seq=0x7fffffffdf10, gf=0xa45dd0, 
>> psl=0x466510, isRc=0, maskHash=0x0, qMaskBits=0x0)
>>      at blat.c:200
>> #4  0x0000000000402a19 in searchOne (seq=0x7fffffffdf10, gf=0xa45dd0, 
>> f=0x466510, isProt=0, maskHash=0x0, qMaskBits=0x0)
>>      at blat.c:241
>> #5  0x0000000000402d04 in searchOneMaskTrim (seq=0x461880, isProt=0, 
>> gf=0xa45dd0, outFile=0x466510, maskHash=0x0,
>>      retTotalSize=0x7fffffffdfa0, retCount=0x7fffffffdfd8) at blat.c:310
>> #6  0x0000000000402ffe in searchOneIndex (fileCount=1, files=0x466750, 
>> gf=0xa45dd0,
>>      outName=0x7fffffffe4bc "NNNTCTCTAGC_FIBfl_comp.blat", isProt=0, 
>> maskHash=0x0, outFile=0x466510, showStatus=1) at blat.c:380
>> #7  0x0000000000403a98 in blat (dbFile=0x7fffffffe480 
>> "/data1/human/GRCh37/GRCh37.fof",
>>      queryFile=0x7fffffffe49f "NNNTCTCTAGC_FIBfl_comp.fasta", 
>> outName=0x7fffffffe4bc "NNNTCTCTAGC_FIBfl_comp.blat") at blat.c:606
>> #8  0x00000000004041c4 in main (argc=4, argv=0x7fffffffe1b8) at blat.c:783
>> (gdb) print pb
>> $1 = (struct gfHit **) 0x3b553f0
>> (gdb) print hit
>> $2 = (struct gfHit *) 0x3a71a80
>> (gdb) run -out=blast8 /data1/human/GRCh37/GRCh37.fof 
>> NNNTCTCTAGC_FIBfl_comp.fasta NNNTCTCTAGC_FIBfl_comp.blat
>> The program being debugged has been started already.
>> Start it from the beginning? (y or n) y
>>
>> Starting program: /homes/atravis/bin/blat -out=blast8 
>> /data1/human/GRCh37/GRCh37.fof NNNTCTCTAGC_FIBfl_comp.fasta 
>> NNNTCTCTAGC_FIBfl_comp.blat
>> Loaded 21439866084 letters in 223 sequences
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> 0x000000000040a1de in clumpHits (gf=0xa45dd0, hitList=0x3ad1c88, minMatch=2) 
>> at genoFind.c:1359
>> 1359     slAddHead(pb, hit);
>> (gdb) q
>> A debugging session is active.
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to