On Mon, 27 Nov 2006, michael watson (IAH-C) wrote: > Hi > > I want to translate very large (eukrayotic chromosomes!) DNA sequences > in all 6 frames. Transeq takes about a day per large chromosome, > running on a linux machine with 3Gb of RAM.
Well, you might try my fasttrans program. It may not do exactly what you want though. If the input sequence is bigger than 100kb it automatically fragments the input into 101kb chunks with a 1kb overlap. You could easily modify the code to make that chunk size so large that the whole chromosome would be read. I just tested it on Human chromosome 10 and it took 29 seconds on an Opteron system to do all 6 frames with the command: % time gunzip -c Homo_sapiens.NCBI36.41.dna.chromosome.10.fa.gz | fasttrans 123456 > foo.out ftp://saf.bio.caltech.edu/pub/software/molbio/fasttrans.c As for fixing the original problem, without looking at the code I'm going to hazard a wild guess. The program may be allocating smallish chunks for a buffer and then searching from the front of the buffer for the new end each time. This bug is never obvious when there are only a few chunks added but the time goes up as the square of the length if innumerable chunks must be added. So when presented with an input 100 times bigger than typical test cases the run time takes 10000 times longer, which sounds more or less like what you're seeing. Regards, David Mathog [EMAIL PROTECTED] Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ EMBOSS mailing list [email protected] http://lists.open-bio.org/mailman/listinfo/emboss
