Dear Tony,

I just wish to mention here that we have effectively mapped millions of
paired-end illumina sequences on to fungal size genomes by effectively
splitting the query into files of 1000-5000 sequences on a desktop running
Ubuntu 10.04 and 32bit. We simply use the unix command: split -l 2 -a 5 -d
fasta_file x/suffix  to generate the files in directory 'x'. We have be
using this approach to visually assess assembly quality through
integration with Gbrowse.

HTH,

Mbandi S.K
----------------------------------------------------
PhD Student
Universiteit van Wes-Kaapland
Private Bag x17
Bellville, 7535
South Africa.
Phone: 27 (0)21 9592364 (work)
Fax: 27 (0)21 9592512   (work)
"Be keen to serendipity", Mbandi S.K 2007

> On 16/07/12 20:27, Galt Barber wrote:
>> Jim recently told another user with the same problem to split your huge
>> database up
>> as Hiram advised above.  This is due to using 32bit pointers. If you use
>> 64-bit pointers,
>> you can access more ram but since your pointers now require twice the
>> storage,
>> it is a waste unless you have a machine with huge ram.
>
> Hi, Galt.
>
> We are using a server with 256GiB RAM, running 64-bit Ubuntu 12.04 LTS.
>
>> In any case another benefit of splitting the database up
>> on a large machine or cluster is that you can run multiple instances of
>> blat
>> in parallel, one process for each piece if you want.
>
> That's good advice and I've been thinking about doing it. What put me
> off a bit is that we have huge query files from NGS sequencing: I'm
> working on a pipeline that my colleagues wrote based on parsing BLAST
> output - I want to use BLAT instead, with the BLAST -m 8 TAB output.
>
> The query files are de-duped Illumina FASTQ files in FASTA format. We
> are looking for chimeras in dee sequencing data - Currently by BLAST.
>
> I was concerned about the overhead of reading the query files into each
> BLAT against part of the database concurrently. However, I've not tried
> that yet, and it might not be the problem I anticipate, but our server
> is i/o-bound and it's something I will have to take into account.
>
>> Putting the results back together just amounts to cat-ing all the
>> results files together.
>
> OK, had worked that one out :-)
>
>> For psl output filtering, there are even tools to help such as pslReps
>> and pslCDnaFilter.
>
> Thanks all for your advice,
>
>    Tony.
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>




_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to