Hi, in this original BLOSUM62 score matrix the identities of the amino acids are weighted differently (I suppose, this comes from their different distribution in nature). In the problem you described you want to scale all identities to 1 and you will lost this information, right? What are the priorities in this transformation? Do you want to get metric properties? Or do you want to sustain the different distribution of the amino acids? Can you give us few words about why do you want to transform this matrix and how do you want to use it? Uta Bohnebeck *************************************************************** Uta Bohnebeck Tel: +49-421-218-7838/ -7090 Universit�t Bremen Fax: +49-421-218-7196 TZI, IS / AG-KI [EMAIL PROTECTED] Universit�tsallee 21-23 Postfach 330 440 D-28334 Bremen --------------------------------------------------------------- http://www.informatik.uni-bremen.de/~bohnebec/home.html --------------------------------------------------------------- -----Urspr�ngliche Nachricht----- Von: Classification, clustering, and phylogeny estimation [mailto:[EMAIL PROTECTED]]Im Auftrag von William Shannon Gesendet: Mittwoch, 2. Mai 2001 19:40 An: [EMAIL PROTECTED] Betreff: More protein stuff As a follow-up to my previous email the first 5 rows,columns of the score matrix are: > blosum62[1:5,1:5] A R N D C A 4 -1 -2 -2 0 R -1 5 0 -2 -3 N -2 0 6 1 -3 D -2 -2 1 6 -3 C 0 -3 -3 -3 9 Comparing sequences ARN to ADC gives similarity scores: (s_AA = 4) + (s_RD = -2) + (s_NC = -3) = -1 and ARN to itself (s_AA = 4) + (s_RR = 5) + (s_NN = 6) = 15 and ADC to itself (s_AA = 4) + (s_DD = 6) + (s_CC = 9) = 19 so the similarity matrix is 15 -1 -1 19 -- William D. Shannon, Ph.D. Assistant Professor of Biostatistics in Medicine Division of General Medical Sciences and Biostatistics Washington University School of Medicine Campus Box 8005, 660 S. Euclid St. Louis, MO 63110 Phone: 314-454-8356 Fax: 314-454-5113 e-mail: [EMAIL PROTECTED] web page: http://ilya.wustl.edu/~shannon
