Good Afternoon Tony:

We know why blat will crash on this sequence:
   Homo_sapiens.GRCh37.67.dna.toplevel.fa.gz

The internals of the program has 32-bit integers that can
not count past 4 GiB.  This fasta file has 20 GiB of sequence.
It will not function as a single .2bit file.  I'm not sure
you would want it to either.  Each of the haplotypes in this
file reproduce the entire chromosome that the haplotype is contained
within.  There are eight complete copies of chr1, five complete
copies of chr2, etc.  It has two different copies of chrY, one
completely empty of sequence.

--Hiram

Tony Travis wrote:
> If I've got time, I'll do some more debugging - It might be a memory 
> leak that only shows up when indexing >4GiB DB files. As I said here, 
> the same query files run fine against the hg19.2bit DB I downloaded. 
> However, using this DB "blat" only uses about 3.9GiB RAM.
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to