Hi, I would like to cite the current list of genes contained in the UCSC genome browser known as the "UCSC genes". From the downloads from your website I noticed that I got a different number of canonical proteins when I uploaded a file (by 3000 protein coding genes higher) than if I used the knowncanonical list and parsed through it myself. I checked for the Pumilio proteins and found in one case that 2 pumilio isoforms were listed from the directly downloaded canonical gene file even though this accession number was not listed in the knowncanonical it was nevertheless found in the fasta file. I got following count of genes:
total UCSC genes (containing all isoforms) : 77614 number of "canonical" UCSC genes (one isoform per gene locus): 27297 total protein coding UCSC genes (containing all isoforms): 62378 number of "canonical" UCSC protein coding genes (one isoform per gene locus): 21018 (the number of protein coding genes was generated by parsing myself through the files). Please could you let me know whether these numbers are currently correct, Thank you very much, Stefanie --------------------------------------------------- Stefanie Gerstberger graduate student in Chemical Biology Tri-Institutional Program Cornell University, Rockefeller University, Memorial Sloan Kettering Cancer Center _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
