Hi,

A number of 'clusters' in the UCSC Gene annotation overlap on the same
strand (if you consider the boundaries of a cluster to be the minimum
txstart and the maximum txend of the cluster's transcripts).


This was queried in a previous post (
https://lists.soe.ucsc.edu/pipermail/genome/2009-October/020325.html), where
Jennifer/Jim explained that clustering is driven by proteins, and non-coding
transcripts are merged into the cluster with which they share the greatest
overlap.


While this explains some of the cluster overlaps, it doesn't shed light on
the scenario where non-coding transcripts are allowed to exist as standalone
clusters, even though they fall within the boundaries of a larger cluster.


For example, in the HG19 version of UCSC Genes, cluster 8145 contains 8
transcripts (5 coding + 3 non-coding). The annotation also contains 41
smaller non-coding clusters, all of which fall completely within the
boundaries of cluster 8145. The smaller clusters are numbered consecutively
from 8146 to 8186 (inclusive).


Is this scenario intentional, or is it possibly an unintended artifact of
the clustering process?


Thanks!
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to