Good Afternoon Tony: We know why blat will crash on this sequence: Homo_sapiens.GRCh37.67.dna.toplevel.fa.gz
The internals of the program has 32-bit integers that can not count past 4 GiB. This fasta file has 20 GiB of sequence. It will not function as a single .2bit file. I'm not sure you would want it to either. Each of the haplotypes in this file reproduce the entire chromosome that the haplotype is contained within. There are eight complete copies of chr1, five complete copies of chr2, etc. It has two different copies of chrY, one completely empty of sequence. --Hiram Tony Travis wrote: > If I've got time, I'll do some more debugging - It might be a memory > leak that only shows up when indexing >4GiB DB files. As I said here, > the same query files run fine against the hg19.2bit DB I downloaded. > However, using this DB "blat" only uses about 3.9GiB RAM. _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
