On 18/03/10 09:11, michael watson (IAH-C) wrote: > Hi > > I'm using EMBOSS 6.1.0 on a fairly small Linux VM which has about 3Gb of RAM. > > I find it strange that extractseq reports a memory problem:
Some further investigation suggests several improvements for the next release: The input was being buffered with the entire input buffer (2000 bytes) saved per line. That is why it used so much memory. This can be reduced to a more reasonable figure (and we can save space in some other string copies). When processing FASTA format (and various others), once the '>' line has been found it cannot fail. It will read everything up to the next '>' or continue to the end of the file. This means we can turn off buffering of FASTA input (and other formats) once they no longer have any format tests that can fail. Both changes will have a similar effect to specifying the format on the command line for large input files. That should work for any release. Hope that helps, Peter _______________________________________________ EMBOSS mailing list [email protected] http://lists.open-bio.org/mailman/listinfo/emboss
