Hi,
The UCSC Genome browser CpG islands track has 28,226 islands. I've read the
description of the table but would still like to ask:
Using the Gardiner-Garden criteria (%GC>50, CpG O/E > 0.6, length>200bp) one
would expect to find 307,193 islands in the human genome. Most of these are
indeed probably false negative results.
The different thing that was done by your algorithm is the step were: "CpG
islands were predicted by searching the sequence one base at a time, scoring
each dinucleotide (+17 for CG and -1 for others) and identifying maximally
scoring segments."
Can you explain the logic behind this criteria and how exactly is it different
from the classic criteria above?
Many studies are now using the "NCBI-strict" definition of CpG islands where
the minimal length is set to 500bp in order to filter out the false-positive
results. Do you have any idea how these two alternatives compare?
Is there some ref that explains more thoroughly the algorithm you were using?
Thanks a lot,
Ravid.
_______________________________________________
Genome maillist - [email protected]
http://www.soe.ucsc.edu/mailman/listinfo/genome