Hi Basel,
If you use the protAcc numbers, then obtaining the protein sequences in
FASTA format is easy with our Table Browser. To do so, follow these
directions:
1. Go to the Table browser ("Tables" in the top link from our home page).
2. Reset cart settings (link below the "get output" button) if UCSC
Genes is not already the track selected.
3. Make a filter by clicking on "create" after filter.
4. Under hg19.kgXref paste your identifiers into the protAcc value box,
each separated by a space, and click "submit".
5. Select the output format "sequence", type in a file name if desired,
and click "get output".
6. Select the "protein" button and click "submit".
Unfortunately, there is not way to see the the protAcc numbers that are
associated with the UCSC Gene IDs in FASTA format. If you would like to
see how the UCSC IDs compare with the protAcc numbers in a table and
then convert to the FASTA format, you will have to instead select the
output format "selected fields from the primary and related tables" and
click "get output". Scroll down to the Linked Tables, select
knownGenePep, scroll down to the bottom and click "Allow selection form
checked tables". Finally, select the fields desired from all tables and
then click "get output".
I hope this information is helpful. Please feel free to contact the
mail list again if you require further assistance.
Best,
Mary
------------------
Mary Goldman
UCSC Bioinformatics Group
On 5/4/10 2:22 PM, Baghal, Basel (NIH/NIAAA) [F] wrote:
> Hi,
>
> I am a research trainee at the NIH/NIAAA. For my current research project, I
> am I need to find the most efficient way of obtaining a selected list of
> RefSeq Predicted Protein Sequences (in FASTA format) which corresponds to my
> list of mrnaAcc numbers (or, alternatively, protAcc numbers). I am
> specifically looking for the UCSC predicted proteins because there are slight
> differences when compared to NCBI.
>
> Previously I have used the genome browser (hg18 and Rhesus) to find the
> protein sequences for a short list of accession numbers one by one, however,
> now I am looking for a efficient way to obtain and possibly export the aa
> sequences for a much longer list. If you could provide me with any pointers
> or advice I would really appreciate it.
>
> Sincere thanks,
> Basel Baghal
> _______________________________________________
> Genome maillist - [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>
_______________________________________________
Genome maillist - [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome