Hi Oliver, Here just a couple of keywords for you to look at, not sure if you have looked at any of these already...
- distance between sequences - probabilistic similarity measures - multidimensional scaling Hope that makes some sense... Andreas On Mon, May 7, 2012 at 3:21 AM, Oliver Schmitt <[email protected]> wrote: > Hi, > > I'm looking for a general advice regarding the comparison of sequences > (S). I mean not necessarily DNA sequences, however, > sequences like Region A is connected with Regions B (shortly A->B) and > then a distance or similarity measure that > allows to identify similiar sequences or paths. The regions are > alphanumerically coded like "Bed nucleus of the stria terminalis > anterior division". > Given are 10^2 to 10^7 different paths, searched are all there mutual > similiarities (e.g., similarity matrix) and a multivariate > classificartion like a dendrogram > based on a meaningful cluster analysis. > > Example > Given: > S1: A->B->C->G > S2: A->B->F->G > S3: A->C->B->G > S4: A->B->D->G > > Searched: > Similiarity matrix > > S1 S2 S3 S4 > S1 ? ? ? ? > S2 ? ? ? ? > S3 ? ? ? ? > S4 ? ? ? ? > > Then I would like to generate a dendrogram based on similarity measure: > > S1-- > |-- > S2-- | > |---- > S3-- | > |-- | > S4-- > > > Thanks a lot for any advices. > > Regards, > Oliver > > _______________________________________________ > Biojava-l mailing list - [email protected] > http://lists.open-bio.org/mailman/listinfo/biojava-l > _______________________________________________ Biojava-l mailing list - [email protected] http://lists.open-bio.org/mailman/listinfo/biojava-l
