Hi, I'm looking for a general advice regarding the comparison of sequences (S). I mean not necessarily DNA sequences, however, sequences like Region A is connected with Regions B (shortly A->B) and then a distance or similarity measure that allows to identify similiar sequences or paths. The regions are alphanumerically coded like "Bed nucleus of the stria terminalis anterior division". Given are 10^2 to 10^7 different paths, searched are all there mutual similiarities (e.g., similarity matrix) and a multivariate classificartion like a dendrogram based on a meaningful cluster analysis.
Example
Given:
S1: A->B->C->G
S2: A->B->F->G
S3: A->C->B->G
S4: A->B->D->G
Searched:
Similiarity matrix
S1 S2 S3 S4
S1 ? ? ? ?
S2 ? ? ? ?
S3 ? ? ? ?
S4 ? ? ? ?
Then I would like to generate a dendrogram based on similarity measure:
S1--
|--
S2-- |
|----
S3-- |
|-- |
S4--
Thanks a lot for any advices.
Regards,
Oliver
<<attachment: schmitt.vcf>>
_______________________________________________ Biojava-l mailing list - [email protected] http://lists.open-bio.org/mailman/listinfo/biojava-l
