Hi Gustavo, In principle I agree to all, see details below:
I think my question wasn't very clear, my intention in this project is > to follow the approach (with the tree steps) outlined in the project's > page. Using the classical progressive alignment heuristic: build the > distance matrix, build the guide tree and using this tree > progressively align more sequences together. > yes > > What I propose for the third step is a first implementation using the > (more simple) dynamic programming described in the first CLUSTAL paper > (I thinks it's from 1988) and incrementally improving the algorithm to > get closer to the one described in CLUSTALW paper (from 1994). Is this > more or less what you had in mind? > yes, sounds good. > > About parallel strategies, I think a relative easy way we could use it > is in the distance matrix construction, we could have several threads > calculating the pairwise alignment for different pairs of sequence in > the set. > Correct. Probably a first implementation would be for a single machine/ multi CPU. More advanced implementations could provide support e.g. for Map/Reduce, JPPF, or something like that... Now, the alignment quality measures is a tougher issue. The CLUSTALW > paper doesn't give any way to measure the quality of the result, they > consider a good alignment the one that is hard to improve by eye (But > they claim that for sequences sufficient similar, no pair less than > 35% identical, the results are good). Can I do the same as in CLUSTALW > paper and leave the quality measure to the user? How concerned should > I be with that in this project? > Getting an overall core-algorithm that works should be priority. The benchmarking part is not mandatory, but something to keep in mind... I have plenty of material for that, once we get to that stage... I will try send to this mailing list a proposal draft until tomorrow > to have some feedback from you. > Excellent, looking forward to it. Andreas -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- _______________________________________________ Biojava-l mailing list - [email protected] http://lists.open-bio.org/mailman/listinfo/biojava-l
