Hello Haiwei,

        You can get this information in a two-step process using the Table 
Browser tool 
on our website ('Tables' from the top blue navigation bar).  You will need to 
repeat these two steps for each of the five species that you are
interested in.

        STEP ONE. To get the gene name, protein ID, gene starting and ending 
position 
on the chromosome, and strandedness follow these steps:

1. Navigate to the Table Browser and choose your organism.
2. Select the RefSeq Genes track (refGene table).
3. As 'output type' choose "selected fields from primary and related tables".
4. Then press the "get output" button.
5. From the next page, choose the following fields from the refGene table:
name, chrom, strand, txStart, txEnd
6. Scroll down and choose the kgXref table and press the "Allow Selection From 
Checked Tables" button.
7. Scroll down and, from the kgXref table, choose the protAcc field.
8. Press the "get output" button.

        The output will be a list of all of the RefSeq Genes for that 
assembly/organism 
with their name, chrom, strand, transcription start & end, and their protein 
accession.  Like so:


#hg18.refGene.name      hg18.refGene.chrom      hg18.refGene.strand     
hg18.refGene.txStart    hg18.refGene.txEnd      hg18.kgXref.protAcc

NM_000808       chrX    -       151086289       151370487       NP_000799


        STEP TWO.  To get the predicted protein for each of the RefSeq Genes, 
follow 
these steps:

1. Navigate to the Table Browser and choose your organism.
2. Select the RefSeq Genes track (refGene table).
3. As 'output type' choose "sequence".
4. then press the "get output" button.
5. From the next page, choose 'protein', then press the "submit" button.

        The output will be a list of the protein sequence for all RefSeq Genes, 
like so:

 >NP_000799.1
MIITQTSHCYMTSLGILFLINILPGTTGQGESRRQEPGDFVKQDIGGLSP
KHAPDIPDDSTDNITIFTRILDRLLDGYDNRLRPGLGDAVTEVKTDIYVT
SFGPVSDTDMEYTIDVFFRQTWHDERLKFDGPMKILPLNNLLASKIWTPD
TFFHNGKKSVAHNMTTPNKLLRLVDNGTLLYTMRLTIHAECPMHLEDFPM
DVHACPLKFGSYAYTTAEVVYSWTLGKNKSVEVAQDGSRLNQYDLLGHVV
GTEIIRSSTGEYVVMTTHFHLKRKIGYFVIQTYLPCIMTVILSQVSFWLN
RESVPARTVFGVTTVLTMTTLSISARNSLPKVAYATAMDWFIAVCYAFVF
SALIEFATVNYFTKRSWAWEGKKVPEALEMKKKTPAAPAKKTSTTFNIVG
TTYPINLAKDTEFSTISKGAAPSASSTPTIIASPKATYVQDSPTETKTYN
SVSKVDKISRIIFPVLFAIFNLVYWATYVNRESAIKGMIRKQ


        You can relate the RefSeq Gene names with the proper protein names by 
reviewing 
the output from step ONE.

        If you need help getting started with the Table Browser, please visit 
the 
User's Guide: http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html

        I hope this information is helpful to you.  Please don't hesitate to 
contact 
the mail list again if you require further assistance.


Regards,

----------
Ann Zweig
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu

Please feel free to search the Genome mailing list archives by visiting our 
home 
page, clicking on "Contact Us", then typing a word or phrase into the search 
box.  On that same page
(http://genome.ucsc.edu/contacts.html), you can subscribe to the Genome mailing 
list.


Haiwei Luo wrote:
> Dear colleagues,
> 
> I am a graduate student in University of South Carolina. In my research
> project, I need UCSC genome annotations of the following five species: Homo
> sapiens, Mus musculus, Drosophila melanogaster, Anopheles gambiae, and
> Caenorhabditis elegans. Genome annotations may contain predicted protein
> sequences, protein ID, gene starting, ending position on the chromosome, and
> strandedness. I don't know where I can find those files. Could you kindly
> send me links to locate those files?
> 
> Sincere thanks,
> Haiwei Luo
> _______________________________________________
> Genome maillist  -  [email protected]
> http://www.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
http://www.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to