[Biojava-l] Similarity measures for generalized sequences

Oliver Schmitt Mon, 07 May 2012 09:06:21 -0700

Hi,

I'm looking for a general advice regarding the comparison of sequences
(S). I mean not necessarily DNA sequences, however,
sequences like Region A is connected with Regions B (shortly A->B) and
then a distance or similarity measure that
allows to identify similiar sequences or paths. The regions are
alphanumerically coded like "Bed nucleus of the stria terminalis
anterior division".
Given are 10^2 to 10^7 different paths, searched are all there mutual
similiarities (e.g., similarity matrix) and a multivariate
classificartion like a dendrogram
based on a meaningful cluster analysis.


Example
Given:
S1: A->B->C->G
S2: A->B->F->G
S3: A->C->B->G
S4: A->B->D->G

Searched:
Similiarity matrix

     S1  S2  S3  S4
S1  ?    ?    ?    ?
S2  ?    ?    ?    ?
S3  ?    ?    ?    ?
S4  ?    ?    ?    ?

Then I would like to generate a dendrogram based on similarity measure:

S1--
        |--           
S2--     |
             |----
S3--     |
        |-- |       
S4--


Thanks a lot for any advices.

Regards,
Oliver

<<attachment: schmitt.vcf>>

_______________________________________________
Biojava-l mailing list  -  [email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l

[Biojava-l] Similarity measures for generalized sequences

Reply via email to