Excellent! I set the MAXSEQIN paramter to 200,000,000 and it ran in 18 seconds....
-----Original Message----- From: David Mathog [mailto:[EMAIL PROTECTED] Sent: 27 November 2006 16:16 To: michael watson (IAH-C) Cc: [email protected] Subject: Re: [EMBOSS] Transeq and very large sequences On Mon, 27 Nov 2006, michael watson (IAH-C) wrote: > Hi > > I want to translate very large (eukrayotic chromosomes!) DNA sequences > in all 6 frames. Transeq takes about a day per large chromosome, > running on a linux machine with 3Gb of RAM. Well, you might try my fasttrans program. It may not do exactly what you want though. If the input sequence is bigger than 100kb it automatically fragments the input into 101kb chunks with a 1kb overlap. You could easily modify the code to make that chunk size so large that the whole chromosome would be read. I just tested it on Human chromosome 10 and it took 29 seconds on an Opteron system to do all 6 frames with the command: % time gunzip -c Homo_sapiens.NCBI36.41.dna.chromosome.10.fa.gz | fasttrans 123456 > foo.out ftp://saf.bio.caltech.edu/pub/software/molbio/fasttrans.c As for fixing the original problem, without looking at the code I'm going to hazard a wild guess. The program may be allocating smallish chunks for a buffer and then searching from the front of the buffer for the new end each time. This bug is never obvious when there are only a few chunks added but the time goes up as the square of the length if innumerable chunks must be added. So when presented with an input 100 times bigger than typical test cases the run time takes 10000 times longer, which sounds more or less like what you're seeing. Regards, David Mathog [EMAIL PROTECTED] _______________________________________________ EMBOSS mailing list [email protected] http://lists.open-bio.org/mailman/listinfo/emboss
