Re: [Genome] UCSC protein fasta database

Brooke Rhead Mon, 23 Jul 2012 16:17:03 -0700

Hi Evan,

The protein sequence for UCSC Genes is actually kept in the table 
knownGenePep.  (This is an odd case; we generally do not store sequence 
in tables for the Genome Browser.)  You can download the table from our 
downloads server:
http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/knownGenePep.txt.gz
or from the Table Browser: http://genome.ucsc.edu/cgi-bin/hgTables

Be aware that there is a second table, knownGeneTxPep, that contains a 
slightly different sequence for some of the peptides.  There is a 
description of the difference on the hg19 UCSC Genes description page 
(http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=knownGene):

*knownGenePep* contains the protein sequences derived from the 
knownGeneMrna transcript sequences. Any protein-level annotations, such 
as the contents of the knownToPfam table, are based on these sequences.

*knownGeneTxPep* contains the protein translation (if any) of each mRNA 
sequence in knownGeneTxMrna.

I see another question from you with the same subject line from last 
week; I think this response answers both questions.  If not, or if you 
have further questions, please write back to us at [email protected].

--
Brooke Rhead
UCSC Genome Bioinformatics Group

On 7/23/12 10:06 AM, Evan Bai wrote:
> Hi,
>
> I have a question regarding retrieving protein fasta sequences from
> the genome browser.
>
> For example, when I searched for "uc010nyk.2" in the browser, clicked
> on the gene, and then clicked on "Protein (852 aa)", it lead me to
> this result:
>> uc010nyk.2 (TAS1R3) length=852
> MLGPAVLGLSLWALLHPGTGAPLCLSQQLRMKGDYVLGGLFPLGEAEEAGLRSRTRPSSP
> VCTRFSSNGLLWALAMKMAVEEINNKSDLLPGLRLGYDLFDTCSEPVVAMKPSLMFLAKA
> GSRDIAAYCNYTQYQPRVLAVIGPHSSELAMVTGKFFSFFLMPQVSYGASMELLSARETF
> PSFFRTVPSDRVQLTAAAELLQEFGWNWVAALGSDDEYGRQGLSIFSALAAARGICIAHE
> GLVPLPRADDSRLGKVQDVLHQVNQSSVQVVLLFASVHAAHALFNYSISSRLSPKVWVAS
> EAWLTSDLVMGLPGMAQMGTVLGFLQRGAQLHEFPQYVKTHLALATDPAFCSALGEREQG
> LEEDVVGQRCPQCDCITLQNVSAGLNHHQTFSVYAAVYSVAQALHNTLQCNASGCPAQDP
> VKPWQLLENMYNLTFHVGGLPLRFDSSGNVDMEYDLKLWVWQGSVPRLHDVGRFNGSLRT
> ERLKIRWHTSDNQKPVSRCSRQCQEGQVRRVKGFHSCCYDCVDCEAGSYRQNPDDIACTF
> CGQDEWSPERSTRCFRRRSRFLAWGEPAVLLLLLLLSLALGLVLAALGLFVHHRDSPLVQ
> ASGGPLACFGLVCLGLVCLSVLLFPGQPSPARCLAQQPLSHLPLTGCLSTLFLQAAEIFV
> ESELPLSWADRLSGCLRGPWAWLVVLLAMLVEVALCTWYLVAFPPEVVTDWHMLPTEALV
> HCRTRSWVSFGLAHATNATLAFLCFLGTFLVRSQPGRYNRARGLTFAMLAYFITWVSFVP
> LLANVQVVLRPAVQMGALLLCVLGILAAFHLPRCYLLMRQPGLNTPEFFLGGGPGDAQGQ
> NDGNTGNQGKHE A fasta file with the protein sequences for uc010nyk.2
>
> And I wonder how I can download the protein fasta file for all hg19
> proteins with UCSC identifier.
>
> thank you!
>
> Sincerely, Evan Bai Yale University
> _______________________________________________ Genome maillist  -
> [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Re: [Genome] UCSC protein fasta database

Reply via email to