Jim recently told another user with the same problem to split your huge database up as Hiram advised above. This is due to using 32bit pointers. If you use 64-bit pointers, you can access more ram but since your pointers now require twice the storage, it is a waste unless you have a machine with huge ram.
In any case another benefit of splitting the database up on a large machine or cluster is that you can run multiple instances of blat in parallel, one process for each piece if you want. Putting the results back together just amounts to cat-ing all the results files together. For psl output filtering, there are even tools to help such as pslReps and pslCDnaFilter. -Galt On Sat, Jul 14, 2012 at 4:18 PM, Tony Travis <[email protected]> wrote: > On 15/07/12 00:08, Hiram Clawson wrote: > > Good Afternoon Tony: > > > > Your 20 GiB database is inefficient and too large to function. > > Break up your database into about 10 pieces, 2 GiB each and run > > your queries against those 10 pieces. > > Hi, Hiram. > > BLAT works very well on my 20GiB DB - when it doesn't segfault! > > I'm loading it from a file of files, in 2bit format, and it is > *stunningly* faster than BLAST of the same DB. I created one 2bit file > from each record in a FASTA format file we downloaded from Ensemble: > > Homo_sapiens.GRCh37.67.dna.toplevel.fa > > What advice would you give me to BLAT the human genome efficiently? > > Thanks, > > Tony. > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
