Dear UCSC genome folks: I've made a "protein-coding homology" track for some of your genomes. I think it might be of general interest: might you consider including it in your browser?
The track shows regions that are homologous to protein coding DNA. This usually means that the regions are themselves protein-coding, or they used to be (i.e. they are pseudogenes). It was constructed simply by finding local alignments between the genome and all known protein sequences. To see why this is useful, look at this example for mouse: http://genome.ucsc.edu/cgi-bin/hgTracks?db=mm9&position=chr10:57776301-57778100&hgt.customText=http://seq.cbrc.jp/~martin/cds-homology/mm9/cds-homology.psl.gz This shows a region of chr10 that has high conservation, especially PhastCons Vertebrate conservation. This region has no other annotations: no gene predictions, etc. Without the new track, it looks like it could be an interesting conserved enhancer element or RNA gene. But the new track shows it has protein-coding homology. The alignment has frame disruptions, so it is a pseudogene. (It would be nice if you could click and see the alignment, but I don't know how to do that.) It appears conserved because the parent gene is conserved and the pseudogene is recent. In short, this track explains lots of apparently evolutionarily-conserved elements that lack any other annotation. Here are tracks for human, rat, dm3, ce6, cb3: http://genome.ucsc.edu/cgi-bin/hgTracks?db=cb3&hgt.customText=http://seq.cbrc.jp/~martin/cds-homology/cb3/cds-homology.psl.gz http://genome.ucsc.edu/cgi-bin/hgTracks?db=ce6&hgt.customText=http://seq.cbrc.jp/~martin/cds-homology/ce6/cds-homology.psl.gz http://genome.ucsc.edu/cgi-bin/hgTracks?db=dm3&hgt.customText=http://seq.cbrc.jp/~martin/cds-homology/dm3/cds-homology.psl.gz http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&hgt.customText=http://seq.cbrc.jp/~martin/cds-homology/hg19/cds-homology.psl.gz http://genome.ucsc.edu/cgi-bin/hgTracks?db=rn4&hgt.customText=http://seq.cbrc.jp/~martin/cds-homology/rn4/cds-homology.psl.gz Here are the scripts for making the tracks automatically: http://seq.cbrc.jp/~martin/cds-homology/cds-homology.zip The track details page has some more info. Have a nice weekend, Martin Frith http://www.cbrc.jp/~martin/ _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
