Hi Mathias,

The paper you're citing describes a previous version UCSC Gene build 
methods. Please read the current methods on the UCSC Genes description 
page: http://genome.ucsc.edu/cgi-bin/hgTrackUi?g=knownGene

If that doesn't clear up the reason why you're seeing multiple gene ids 
with the same protein ID, please let us know: [email protected]

Thoguh Fan Hsu pointed out that you may want to consider using the 
current knownGenePep table, which
contains predicted AA sequences derived directly from the DNA sequences of
UCSC Genes based on reference genome.

-
Greg Roe
UCSC Genome Bioinformatics Group


On 8/8/11 12:56 AM, Kuhring, Mathias wrote:
> Hi everybody,
>
> I downloaded the knownGene table with the table browser (default settings) to 
> use the annotations for some protein mapping.
>
> Hsu's paper (The UCSC Known Genes, 2006) says "mRNA with the highest score is 
> selected as the representative mRNA for the protein" and "removing duplicates 
> having identical chromosome number, start and ending positions of coding 
> sequence"
>
> So actually I expected one gene per protein (uniprot id) but i found a couple 
> of genes coding for the same protein.
> This is causing some trouble, because now I'm not sure which one to take for 
> my protein.
>
> The "redundant" (?) genes almost share the same loci and/or cds, but seem to 
> differ in number of exons and/or splice sites.
> So I'm afraid they don't code for the same amino acid sequence, which i 
> thought usally leads to different proteins (or are there many exceptions?).
>
> Here is an example (I attached some more):
> #name chrom   strand  txStart txEnd   cdsStart        cdsEnd  exonCount       
> exonStarts      exonEnds        proteinID       alignID
> uc001lqz.2    chr11   +       747431  765023  747481  764845  8       
> 747431,755878,758949,760121,763343,763746,764287,764812,        
> 747578,756002,759057,760253,763519,763944,764433,765023,        P37837  
> uc001lqz.2
> uc001lra.2    chr11   +       747431  765023  747481  764413  8       
> 747431,755878,758949,760121,763343,763746,764287,764812,        
> 747578,756002,759057,760253,763519,763940,764433,765023,        P37837  
> uc001lra.2
>
> The second gene's cdsEnd is smaller and exon 6 ends 4 positions earlier but 
> is still in the cds.
> I had a look a the sequences and I there is a shift. So I think the 
> propability to code for the same protein is pretty low.
>
> But how should I handle those genes now? Do they code for protein isoforms, 
> which didn't get a unique protein id yet?
> May i got Hsu's paper wrong? Or did I just miss some information somewhere?
>
> I hope you can help me with this. Thanks a lot.
>
> Greetinz,
> Mathias
>
>
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to