Hi,
 
The UCSC Genome browser CpG islands track has 28,226 islands. I've read the 
description of the table but would still like to ask:
 
Using the Gardiner-Garden criteria (%GC>50, CpG O/E > 0.6, length>200bp) one 
would expect to find 307,193 islands in the human genome. Most of these are 
indeed probably false negative results.
 
The different thing that was done by your algorithm is the step were: "CpG 
islands were predicted by searching the sequence one base at a time, scoring 
each dinucleotide (+17 for CG and -1 for others) and identifying maximally 
scoring segments." 
 
Can you explain the logic behind this criteria and how exactly is it different 
from the classic criteria above? 
 
Many studies are now using the "NCBI-strict" definition of CpG islands where 
the minimal length is set to 500bp in order to filter out the false-positive 
results. Do you have any idea how these two alternatives compare?
 
Is there some ref that explains more thoroughly the algorithm you were using?
 
Thanks a lot,
 
Ravid.


      
_______________________________________________
Genome maillist  -  [email protected]
http://www.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to