Re: [Genome] BLAT Segfault

Galt Barber Mon, 16 Jul 2012 12:27:45 -0700

Jim recently told another user with the same problem to split your huge
database up
as Hiram advised above.  This is due to using 32bit pointers. If you use
64-bit pointers,
you can access more ram but since your pointers now require twice the
storage,
it is a waste unless you have a machine with huge ram.

In any case another benefit of splitting the database up
on a large machine or cluster is that you can run multiple instances of blat
in parallel, one process for each piece if you want.

Putting the results back together just amounts to cat-ing all the results
files together.

For psl output filtering, there are even tools to help such as pslReps and
pslCDnaFilter.

-Galt

On Sat, Jul 14, 2012 at 4:18 PM, Tony Travis <[email protected]> wrote:

> On 15/07/12 00:08, Hiram Clawson wrote:
> > Good Afternoon Tony:
> >
> > Your 20 GiB database is inefficient and too large to function.
> > Break up your database into about 10 pieces, 2 GiB each and run
> > your queries against those 10 pieces.
>
> Hi, Hiram.
>
> BLAT works very well on my 20GiB DB - when it doesn't segfault!
>
> I'm loading it from a file of files, in 2bit format, and it is
> *stunningly* faster than BLAST of the same DB. I created one 2bit file
> from each record in a FASTA format file we downloaded from Ensemble:
>
>    Homo_sapiens.GRCh37.67.dna.toplevel.fa
>
> What advice would you give me to BLAT the human genome efficiently?
>
> Thanks,
>
>    Tony.
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Re: [Genome] BLAT Segfault

Reply via email to