I've been fixing a severe performance problem with my hacked Artemis which delegates its sequence functions to BioJava SymbolLists. This was taking much longer to scroll/update etc than vanilla Artemis, increasing roughly linearly with sequence length and feature count. End result is a 5Mb sequence + 10000 features halts on a Compaq ES40. The cause turned out to be subStr() in AbstractSymbolList which makes 4 method calls for each base in the subsequence (symbolAt(), length(), getToken(), append()) when creating a readable (i.e. string) representation of the sequence. Is this something that is worth looking at in the BioJava core? For now I'm caching the whole stringified sequence elsewhere to get round this. The reason for lots of substringing is that Artemis avoids Java graphics rounding errors at high sequence/pixel coordinates by checking visibility of residues/features and then only representing those in the viewable area, all drawn from a zero origin using integer coords. So the main genome sequence gets a substring and so do all the visible features. Caching the whole sequence as chars in addition to the overhead of an object for each residue seems pretty inefficient. Or is this a case of Patient: "Doctor, it hurts when I do this." Doctor: "Well, don't do that." ;) Keith -- -= Keith James - [EMAIL PROTECTED] - http://www.sanger.ac.uk/Users/kdj =- The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambs CB10 1SA _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l
