Hi Gareth, Please see this answer to a similar question: https://lists.soe.ucsc.edu/pipermail/genome/2010-October/023997.html
If you turn on the RepeatMasker track, you will be able to see that there is a low complexity repeat that is responsible for masking out 7 of the 29 CpGs. -- Brooke Rhead UCSC Genome Bioinformatics Group Gareth Wilson wrote on 11/1/10 6:30 AM: > Hello, > > I recently downloaded the cpgIslandExt table for Human genome GRCh37. Whilst > using this file in an analysis, I stumbled upon a problem, the root of which > seemed to come back to the cpgIsland file. It seems that, for some islands, > the metadata is incorrect. For example: > > http://genome.ucsc.edu/cgi-bin/hgc?hgsid=171991065&o=33697914&t=33698193&g=c > pgIslandExt&i=CpG%3A+22 > > This cpg island has a CpG count of 22. However on inspection of the > sequence, the actual count can be seen to be 29. Hence the percentage CpG > and ratio of observed/expected will also be incorrect. > > Am I missing something obvious? And if not, have you any idea as to how many > islands have similar errors? > > Hope you can help! > > Thanks > > Gareth Wilson > > > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
