Hello,

I have a simple question about the number of CG site in one CpG
island: chrX:38071367-38071954.

The UCSC genome browser shows that it has 58 CG sites, but when I
count the number of CG site by myself, it is 63 CG sites. I got 63 in
both my R code results and manual counting (i.e., in linux text file,
search "CG" and highlight" them).  In fact, I got 63 in both the DNA
sequences I downloaded from UCSC genome browser and the hg18 version
sequence I got from R Bioconductor package. See the following for more
details. Do you have any idea why we have this type of inconsistent
result? Is it because those 5 CG sites located in the repeat region,
so they are not included? If yes, why these 5 CG sites are dealt in
this way?

############ UCSC hg18 version DNA sequences ############################

>hg18_cpgIslandExt_CpG: 58 range=chrX:38071367-38071954 5'pad=0 3'pad=0 
>strand=+ repeatMasking=lower

CGTCCGGTCCTCTGCCCTCAGTCATTCGCGGGAGCGCAACCAGCGATCCC  # 7  (including
the one with C at the end and G at the beginning)
GCCCCAGTCCGGCTGCCAAGCCTGGGGCCTGTCCCCCTACAGGGCCGATC  # 2
CGGAggcggggcccggccgcccgcggACCCTCCCTCCCGGCCTTCCGCCA  # 8
CCGGCGCGGGCGCAACTCACCGGGCATCAGCTCTTCCGGCTCCCTCATGC  # 6
CACGGGCAGTACGGGCAGCCTGCGCCGGGGCCAGGAGGCTGTAGAGGACG  # 5
GTTTGGTCGGGGCTAAAGCAGCTACTCCGCACCGACGCGGGCCGCGAAAG  # 7
CCCCCAAGTTCCGCATGGCGAAACTCCGGAGATCAACTACAACCGCGCTC  # 5
CCGGAAGTCAACAAACAGCCGCTACGGGCAACGGGGGCGGAGCTTGGGAA  # 5
TGCAAGGCGGGACAGGCGCCGTTGGGGAGGGGAACGGAGGCCGGGTGGCT  # 5
GGTAAGGGGCAGGCTCAGGCACAGCGGAGGGGCAGTAGAGACCACGCGCC  # 3
CTCTGGCGGCCTGGAGCAGAGAGGCGGCCACGCCGCGCAGTGATGCTGTG  # 5
GAGTCCGCGCCCTTGTGCCGTTGGAGGTCCAGGCGCCG              # 5

###################### From R Bioconductor genome sequence
CGTCCGGTCCTCTGCCCTCAGTCATTCGCGGGAGCGCAACCAGCGATCCC # 7
GCCCCAGTCCGGCTGCCAAGCCTGGGGCCTGTCCCCCTACAGGGCCGATC # 2
CGGAGGCGGGGCCCGGCCGCCCGCGGACCCTCCCTCCCGGCCTTCCGCCA # 8
CCGGCGCGGGCGCAACTCACCGGGCATCAGCTCTTCCGGCTCCCTCATGC # 6
CACGGGCAGTACGGGCAGCCTGCGCCGGGGCCAGGAGGCTGTAGAGGACG # 5
GTTTGGTCGGGGCTAAAGCAGCTACTCCGCACCGACGCGGGCCGCGAAAG # 7
CCCCCAAGTTCCGCATGGCGAAACTCCGGAGATCAACTACAACCGCGCTC # 5
CCGGAAGTCAACAAACAGCCGCTACGGGCAACGGGGGCGGAGCTTGGGAA # 5
TGCAAGGCGGGACAGGCGCCGTTGGGGAGGGGAACGGAGGCCGGGTGGCT # 5
GGTAAGGGGCAGGCTCAGGCACAGCGGAGGGGCAGTAGAGACCACGCGCC # 3
CTCTGGCGGCCTGGAGCAGAGAGGCGGCCACGCCGCGCAGTGATGCTGTG # 5
GAGTCCGCGCCCTTGTGCCGTTGGAGGTCCAGGCGCCG                     # 5

Shuying
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to