Hi Evan, The protein sequence for UCSC Genes is actually kept in the table knownGenePep. (This is an odd case; we generally do not store sequence in tables for the Genome Browser.) You can download the table from our downloads server: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/knownGenePep.txt.gz or from the Table Browser: http://genome.ucsc.edu/cgi-bin/hgTables
Be aware that there is a second table, knownGeneTxPep, that contains a slightly different sequence for some of the peptides. There is a description of the difference on the hg19 UCSC Genes description page (http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=knownGene): *knownGenePep* contains the protein sequences derived from the knownGeneMrna transcript sequences. Any protein-level annotations, such as the contents of the knownToPfam table, are based on these sequences. *knownGeneTxPep* contains the protein translation (if any) of each mRNA sequence in knownGeneTxMrna. I see another question from you with the same subject line from last week; I think this response answers both questions. If not, or if you have further questions, please write back to us at [email protected]. -- Brooke Rhead UCSC Genome Bioinformatics Group On 7/23/12 10:06 AM, Evan Bai wrote: > Hi, > > I have a question regarding retrieving protein fasta sequences from > the genome browser. > > For example, when I searched for "uc010nyk.2" in the browser, clicked > on the gene, and then clicked on "Protein (852 aa)", it lead me to > this result: >> uc010nyk.2 (TAS1R3) length=852 > MLGPAVLGLSLWALLHPGTGAPLCLSQQLRMKGDYVLGGLFPLGEAEEAGLRSRTRPSSP > VCTRFSSNGLLWALAMKMAVEEINNKSDLLPGLRLGYDLFDTCSEPVVAMKPSLMFLAKA > GSRDIAAYCNYTQYQPRVLAVIGPHSSELAMVTGKFFSFFLMPQVSYGASMELLSARETF > PSFFRTVPSDRVQLTAAAELLQEFGWNWVAALGSDDEYGRQGLSIFSALAAARGICIAHE > GLVPLPRADDSRLGKVQDVLHQVNQSSVQVVLLFASVHAAHALFNYSISSRLSPKVWVAS > EAWLTSDLVMGLPGMAQMGTVLGFLQRGAQLHEFPQYVKTHLALATDPAFCSALGEREQG > LEEDVVGQRCPQCDCITLQNVSAGLNHHQTFSVYAAVYSVAQALHNTLQCNASGCPAQDP > VKPWQLLENMYNLTFHVGGLPLRFDSSGNVDMEYDLKLWVWQGSVPRLHDVGRFNGSLRT > ERLKIRWHTSDNQKPVSRCSRQCQEGQVRRVKGFHSCCYDCVDCEAGSYRQNPDDIACTF > CGQDEWSPERSTRCFRRRSRFLAWGEPAVLLLLLLLSLALGLVLAALGLFVHHRDSPLVQ > ASGGPLACFGLVCLGLVCLSVLLFPGQPSPARCLAQQPLSHLPLTGCLSTLFLQAAEIFV > ESELPLSWADRLSGCLRGPWAWLVVLLAMLVEVALCTWYLVAFPPEVVTDWHMLPTEALV > HCRTRSWVSFGLAHATNATLAFLCFLGTFLVRSQPGRYNRARGLTFAMLAYFITWVSFVP > LLANVQVVLRPAVQMGALLLCVLGILAAFHLPRCYLLMRQPGLNTPEFFLGGGPGDAQGQ > NDGNTGNQGKHE A fasta file with the protein sequences for uc010nyk.2 > > And I wonder how I can download the protein fasta file for all hg19 > proteins with UCSC identifier. > > thank you! > > Sincerely, Evan Bai Yale University > _______________________________________________ Genome maillist - > [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
