Hello, I have a simple question about the number of CG site in one CpG island: chrX:38071367-38071954.
The UCSC genome browser shows that it has 58 CG sites, but when I count the number of CG site by myself, it is 63 CG sites. I got 63 in both my R code results and manual counting (i.e., in linux text file, search "CG" and highlight" them). In fact, I got 63 in both the DNA sequences I downloaded from UCSC genome browser and the hg18 version sequence I got from R Bioconductor package. See the following for more details. Do you have any idea why we have this type of inconsistent result? Is it because those 5 CG sites located in the repeat region, so they are not included? If yes, why these 5 CG sites are dealt in this way? ############ UCSC hg18 version DNA sequences ############################ >hg18_cpgIslandExt_CpG: 58 range=chrX:38071367-38071954 5'pad=0 3'pad=0 >strand=+ repeatMasking=lower CGTCCGGTCCTCTGCCCTCAGTCATTCGCGGGAGCGCAACCAGCGATCCC # 7 (including the one with C at the end and G at the beginning) GCCCCAGTCCGGCTGCCAAGCCTGGGGCCTGTCCCCCTACAGGGCCGATC # 2 CGGAggcggggcccggccgcccgcggACCCTCCCTCCCGGCCTTCCGCCA # 8 CCGGCGCGGGCGCAACTCACCGGGCATCAGCTCTTCCGGCTCCCTCATGC # 6 CACGGGCAGTACGGGCAGCCTGCGCCGGGGCCAGGAGGCTGTAGAGGACG # 5 GTTTGGTCGGGGCTAAAGCAGCTACTCCGCACCGACGCGGGCCGCGAAAG # 7 CCCCCAAGTTCCGCATGGCGAAACTCCGGAGATCAACTACAACCGCGCTC # 5 CCGGAAGTCAACAAACAGCCGCTACGGGCAACGGGGGCGGAGCTTGGGAA # 5 TGCAAGGCGGGACAGGCGCCGTTGGGGAGGGGAACGGAGGCCGGGTGGCT # 5 GGTAAGGGGCAGGCTCAGGCACAGCGGAGGGGCAGTAGAGACCACGCGCC # 3 CTCTGGCGGCCTGGAGCAGAGAGGCGGCCACGCCGCGCAGTGATGCTGTG # 5 GAGTCCGCGCCCTTGTGCCGTTGGAGGTCCAGGCGCCG # 5 ###################### From R Bioconductor genome sequence CGTCCGGTCCTCTGCCCTCAGTCATTCGCGGGAGCGCAACCAGCGATCCC # 7 GCCCCAGTCCGGCTGCCAAGCCTGGGGCCTGTCCCCCTACAGGGCCGATC # 2 CGGAGGCGGGGCCCGGCCGCCCGCGGACCCTCCCTCCCGGCCTTCCGCCA # 8 CCGGCGCGGGCGCAACTCACCGGGCATCAGCTCTTCCGGCTCCCTCATGC # 6 CACGGGCAGTACGGGCAGCCTGCGCCGGGGCCAGGAGGCTGTAGAGGACG # 5 GTTTGGTCGGGGCTAAAGCAGCTACTCCGCACCGACGCGGGCCGCGAAAG # 7 CCCCCAAGTTCCGCATGGCGAAACTCCGGAGATCAACTACAACCGCGCTC # 5 CCGGAAGTCAACAAACAGCCGCTACGGGCAACGGGGGCGGAGCTTGGGAA # 5 TGCAAGGCGGGACAGGCGCCGTTGGGGAGGGGAACGGAGGCCGGGTGGCT # 5 GGTAAGGGGCAGGCTCAGGCACAGCGGAGGGGCAGTAGAGACCACGCGCC # 3 CTCTGGCGGCCTGGAGCAGAGAGGCGGCCACGCCGCGCAGTGATGCTGTG # 5 GAGTCCGCGCCCTTGTGCCGTTGGAGGTCCAGGCGCCG # 5 Shuying _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
