That's something I'll need to go back and revisit after my deadline passes at the end of this week. Initially, I was creating them on the fly at the time of alignment, but it would be more efficient to store them that way in the gene object itself. I was also passing an InputStreamReader for the substitution matrix each time (pulling the matrix from my jar), but storing it as a string would also be a better option, especially since I'm threading and there are so many alignments.
Chris On Tue, Oct 26, 2010 at 3:23 PM, Andreas Prlic <[email protected]> wrote: > > ok, how do you create the biojava3 Sequence objects? just trying to > find out where the bottlenecks are, so we can fix them... > > A > > On Tue, Oct 26, 2010 at 12:20 PM, Chris Friedline <[email protected]> wrote: > > Hi, > > The io should be the same, since I've used the same set of genes for testing > > both. So, it's either the alignment calculation or the new biojava design > > contributing to the slowness. > > Chris > > > > On Tue, Oct 26, 2010 at 2:42 PM, Andreas Prlic <[email protected]> wrote: > >> > >> Hi Chris, > >> > >> about your comment that the biojava3-alignment is slower than the 1.7 > >> one: Do you have any data if this is coming from the io or is the > >> actual alignment calculation slower? > >> > >> Andreas > >> > >> On Sun, Oct 24, 2010 at 7:57 AM, Chris Friedline <[email protected]> > >> wrote: > >> > Hello, > >> > > >> > I am getting a weird problem with protein alignment using > >> > NeedlemanWunsch in 1.7.1, in that the alignment does not span the > >> > entire length of the proteins. I've verified that this should not > >> > happen with needle (from EMBOSS), neobio, BioJava3, and NW on NCBI. > >> > I'm reluctant to switch to BioJava3 at this time, since performance is > >> > about 2-3x slower than 1.7.1 for the alignments, and I'm doing about > >> > 350,000 of them. > >> > > >> > An example of this alignment error, is shown here: > >> > http://pastebin.com/mdX516R6 > >> > > >> > Notice that the alignment stops 1 amino acid short of the end in both > >> > cases. The parameters for the alignment are: BLOSUM50, gapOpen=10, > >> > gapExtend=2. > >> > > >> > Thanks, > >> > Chris > >> > > >> > -- > >> > PhD Candidate, Integrative Life Sciences > >> > Virginia Commonwealth University > >> > Richmond, VA > >> > _______________________________________________ > >> > Biojava-l mailing list - [email protected] > >> > http://lists.open-bio.org/mailman/listinfo/biojava-l > >> > > >> > >> > >> > >> -- > >> ----------------------------------------------------------------------- > >> Dr. Andreas Prlic > >> Senior Scientist, RCSB PDB Protein Data Bank > >> University of California, San Diego > >> (+1) 858.246.0526 > >> ----------------------------------------------------------------------- > > > > > > > > -- > > PhD Candidate, Integrative Life Sciences > > Virginia Commonwealth University > > Richmond, VA > > > > > > -- > ----------------------------------------------------------------------- > Dr. Andreas Prlic > Senior Scientist, RCSB PDB Protein Data Bank > University of California, San Diego > (+1) 858.246.0526 > ----------------------------------------------------------------------- -- PhD Candidate, Integrative Life Sciences Virginia Commonwealth University Richmond, VA _______________________________________________ Biojava-l mailing list - [email protected] http://lists.open-bio.org/mailman/listinfo/biojava-l
