Hi Matthew, We suggest that you report the issue to the Genome Reference Consortium which is the organization that provided the assembly data. Here is the link:
http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/ReportAnIssue.shtml If you have further questions, please email the mailing list: [email protected]. Vanessa Kirkup Swing UCSC Genome Bioinformatics Group ---------- Forwarded message ---------- From: Parks, Matthew <[email protected]> Date: Tue, Dec 6, 2011 at 2:37 PM Subject: [Genome] Error in the Human Genome: an unidentified base? To: [email protected] Hello, In my studies, I came across the following strange error in the UCSC Genome Browser: Use the UCSC Genome Browser and its "get DNA" function to examine the following short stretch of sequence: chr10:37412173-37412176 The sequence reads "ANCC". Notice that the second nucleotide is marked as "N". Why is this? This part of the chromosome is sufficiently far from both the centromere and telomere, so presumably it has been sequenced fairly well. Even if there is uncertainty about the true value of this nucleotide, isn't some sort of consensus used. Also, this seems to be the only "N" nucleotide in the area - from my analysis, there are no other "N" for at least 10,000 nucleotides before and after the position in question. Note that this stretch of sequence is part of a repeat (that's how I came across it in the first place), and I understand that there is ambiguity in repeat regions - but then why should only one nucleotide be unidentified ("N") ? Wouldn't there be more ambiguous nucleotides? Thank you -- Matthew Parks PhD candidate, Division of Applied Mathematics Brown University _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
