Hi Stefanie,

If you are using the Gene Sorter for human, mouse, or rat, the gene set 
that is used is "UCSC Genes" (or "Known Genes" if you are using an older 
assembly).

The UCSC Genes set is now created using data from RefSeq, Genbank, CCDS 
and UniProt, and it is based on more than a simple merging of databases. 
  You can read about the methods used to create the set on the UCSC 
Genes description page.  To get to it from the Gene Sorter, click on a 
link in the "Description" column, scroll to the bottom, and click the 
link that says "Click here for details on how this gene model was made 
and data restrictions if any."  It should take you to this page:

http://genome.ucsc.edu/cgi-bin/hgGene?hgg_do_kgMethod=1

An easy way to get a protein fasta file for all of the UCSC Genes is to 
use the Table Browser (http://genome.ucsc.edu/cgi-bin/hgTables).  Select 
your assembly, then:

group: Genes and Gene Prediction Tracks
track: UCSC Genes
table: knownGene
region: genome
output format: sequence
output file: (enter a name for the file you will download)

Hit "get output" and choose "protein" on the next page.  The output 
should be a protein fasta file of all of the UCSC Genes for your 
assembly.  Note that this will include ALL splice variants.  To get only 
the splice variants that appear by default in the Gene Sorter, you would 
need to first get a list of gene names from the 'knownCanonical' table 
and upload them via the "identifiers (names/accessions)" button in the 
Table Browser.

I hope this helps.  If you need any further assistance, please feel free 
to write back to the mailing list.

--
Brooke Rhead
UCSC Genome Bioinformatics Group


On 09/19/10 09:56, Stefanie Gerstberger wrote:
> Hi,
> I am trying to find the current reference list of genes ( and their protein 
> fasta formats) used by the gene sorter, the original publication says it used 
> a 
> synthesis of refseq, genebank and swissprot. Is this still correct that all 
> genes of these databases were simply merged into one file or where can I find 
> the genes (and protein fasta files) for the genes currently displayed on the 
> gene sorter? Is there a link on UCSC genome browser to access the currently 
> updated gene sorter file?
> Thanks a lot,
> Stefanie
> 
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to