On 16/07/12 20:27, Galt Barber wrote: > Jim recently told another user with the same problem to split your huge > database up > as Hiram advised above. This is due to using 32bit pointers. If you use > 64-bit pointers, > you can access more ram but since your pointers now require twice the > storage, > it is a waste unless you have a machine with huge ram.
Hi, Galt. We are using a server with 256GiB RAM, running 64-bit Ubuntu 12.04 LTS. > In any case another benefit of splitting the database up > on a large machine or cluster is that you can run multiple instances of blat > in parallel, one process for each piece if you want. That's good advice and I've been thinking about doing it. What put me off a bit is that we have huge query files from NGS sequencing: I'm working on a pipeline that my colleagues wrote based on parsing BLAST output - I want to use BLAT instead, with the BLAST -m 8 TAB output. The query files are de-duped Illumina FASTQ files in FASTA format. We are looking for chimeras in dee sequencing data - Currently by BLAST. I was concerned about the overhead of reading the query files into each BLAT against part of the database concurrently. However, I've not tried that yet, and it might not be the problem I anticipate, but our server is i/o-bound and it's something I will have to take into account. > Putting the results back together just amounts to cat-ing all the > results files together. OK, had worked that one out :-) > For psl output filtering, there are even tools to help such as pslReps > and pslCDnaFilter. Thanks all for your advice, Tony. _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
