Dear UCSC, I've looked through the archives so I think my question hasn't yet been answered.
I'm looking at microsatellites in the snp130.txt file. I am trying to make sense of the coordinates. In many case the coordinates of a microsatellite refer to a single base (chromEnd = chromStart + 1). Such is the cases A and B below. But where is the microsatellite? According to the alignments (by clicking on the rs... name), in case A the indicated microsatellite (the black bar in the browser with snp130 set to "full") is at the *end* of the CA repeat (the actual microsatellite). In case B, the indicated microsatellite is at the *beginning* of the CA repeat. Both of these are top strand snps. Case A. 627 chr1 5576651 5576652 rs3223599 0 + C C (CA)19/20/21/22/23/24 genomic microsatellite by-frequency 0.752086 0.089764 unknown exact 1 The genome browser shows the entire microsatellite repeat (all 24 copies of CA, so 48 bases) as the reference sequence. The position 5576652 marks the *end* of the CA repeat. The browser just shows the microsatellite as a single base. Case B: 658 chr1 9585594 9585595 rs3220726 0 + C C lengthTooLong genomic microsatellite by-frequency 0.8126 0.129764 unknown exact 1 The genome browser shows base at 1-position 9585595 in this case is at the *left* (beginning) of the CA repeat. This repeat is not particularly long: 58 bases. I don't see any way that I can get this information from the line above. Question 1) So how would anyone know, by looking in snp130.txt, where the actual microsatellite is? Is there some other table that I could download that would give this information? In case C, the coordinates given are the actual coordinates of the microsatellite. Case C: 753 chr1 22129926 22129973 rs3222966 0 + CACACACACACACACACACACACACACACACACACACACACACACAC CACACACACACACACACACACACACACACACACACACACACACACAC (CA)17/18/19/20/21/22/23/24 genomic microsatellite by-frequency 0.7524 0.158867 unknown range 1 In this case, the microsatellite shows the full coordinates of the 47-base microsatellite which includes all (but 1/2) of the 24-copy CA repeat. Question 2) If the observed is listed as lengthTooLong, is there any way to determine what the bases of the microsatellite are? (Without that, they aren't much use.) Case D: 852 chr1 35119589 35119590 rs3219614 0 + T T (CA)20/21/22/23/A/T genomic microsatellite by-frequency 0.284918 0.283047 unknown exact 1 Question 3) In case D, what does the /A/T mean at the end of (CA)20/21/22/23/A/T ? Question 4) In case D, the CA repeat starts at position 35119591 (chr1) and ends at 35119632, giving 42 bases or 21 copies of the repeat. So why does the allele indicate that there are 23 copies? Thank you very much! David Gordon _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
