Hi, John!

I don't know of any size-limit such as you mention,
but our databases are usually around 3 to 5GB and not 36GB.

We will pass along your message to Jim Kent, the author of BLAT,
and probably be getting back to you off-list.

-Galt

On 06/29/11 08:22, John Legato wrote:
> 
> We're experiencing a segmentation fault running blat on an x86_64 box 
> with 128GB of RAM. We are running against NCBI's nt database (in FASTA) 
> form. We are using the following query sequences for testing (testst.fa 
> in the example below):
> 
> 
>  >Test1
> ATCTCTACATCCGCCCACTCCCAAATCCGTTTTGTGCAACCAACCTCTATT
>  >Test2
> CCCCCACAGCAGCAGGAATAATCAAGGGGATGACAGGAAGAGNNNNNNNNN
>  >Test3
> AAGTAACCTAGACCTTAAAATTGTACATAGCCTCTCCGAGGANNNNNNNNN
>  >Test4
> TTCAAACTTAAGGAATGTAGTGTTGCGATGGGTACTCAACTGATCCCANTT
>  >Test5
> AGATGTGGTTCCACCCATAACTCAAGGGCAGATAGGAAACACCNNNNNNNN
>  >Test6
> AGGCAACCCCCGGCAGGATCATTCCAGGCACCGTGGGTTTCANNNNNNNNN
>  >Test7
> TCTTAGTGTTGAGTCAGACGCAAAGTTGAGACAGGGGAAAAGGCNNNNNNN
>  >Test8
> CTTCTACATGTTGGCTGCCAGTTAAACCAGCACCATTTGTTGCAAATGCTA
>  >Test9
> CCTCACTAACACAAATGTTGGAGGAAGTCTTGGGAGGCATCCTATTGATAC
>  >Test10
> TTTGTGTTCTGGGGCAGCTGGCTTTAGAAAGAGAACTCCAGGTCAANNNNG
> 
> 
> We've recompiled blat 34 from source with -g, gdb reports the following 
> when we do a back trace:
> 
> 
> (gdb) set args  nt.fa testst.fa testout1.psl -out=blast
> (gdb) r
> Starting program: /v/server1a/jlegato/bin/x86_64/blat nt.fa testst.fa 
> testout1.psl -out=blast
> Loaded 36318681436 letters in 14096376 sequences
> 
> Program received signal SIGSEGV, Segmentation fault.
> gfFindClumpsWithQmask (gf=0x82f028d50, seq=<value optimized out>, 
> qMaskBits=<value optimized out>, qMaskOffset=<value optimized out>, 
> lm=<value optimized out>, retHitCount=<value optimized out>) at 
> genoFind.c:1359
> 1359        slAddHead(pb, hit);
> (gdb) bt
> #0  gfFindClumpsWithQmask (gf=0x82f028d50, seq=<value optimized out>, 
> qMaskBits=<value optimized out>, qMaskOffset=<value optimized out>, 
> lm=<value optimized out>, retHitCount=<value optimized out>) at 
> genoFind.c:1359
> #1  0x000000000040acc5 in gfLongDnaInMem (query=0x7fff20f542c0, 
> gf=0x82f028d50, isRc=0, minScore=30, qMaskBits=0x0, out=0xb30810, 
> fastMap=0, band=0) at gfBlatLib.c:1530
> #2  0x000000000040329e in searchOneStrand (seq=0x7fff20f542c0, 
> gf=0x82f028d50, psl=<value optimized out>, isRc=0, maskHash=<value 
> optimized out>, qMaskBits=0x0) at blat.c:200
> #3  0x000000000040332c in searchOne (seq=0x7fff20f542c0, gf=0x82f028d50, 
> f=0xb30510, isProt=0, maskHash=0x0, qMaskBits=0x0) at blat.c:241
> #4  0x000000000040343f in searchOneMaskTrim (seq=0x6488c0, isProt=0, 
> gf=0x82f028d50, outFile=0xb30510, maskHash=0x0, 
> retTotalSize=0x7fff20f54358, retCount=0x7fff20f54364) at blat.c:310
> #5  0x00000000004036a6 in searchOneIndex (fileCount=1, files=0xb30790, 
> gf=0x82f028d50, outName=<value optimized out>, isProt=0, maskHash=0x0, 
> outFile=0xb30510, showStatus=1) at blat.c:380
> #6  0x00000000004039e9 in blat (dbFile=<value optimized out>, 
> queryFile=<value optimized out>, outName=0x7fff20f547cd "testout1.psl") 
> at blat.c:606
> #7  0x0000000000404049 in main (argc=4, argv=0x7fff20f54528) at blat.c:783
> 
> We've also tried the psl output form with  similar results.
> 
> Does this suggest an error in the output functions? We're also wondering 
> if the size of nt.fa (36GB) is just too large for blat. Any other ideas 
> on what might be causing the segfault? We've had success with smaller 
> databases.
> 
> Thanks
> 
> John
> 
> 
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to