Hi Greg,

thank you very much for the answer and the hints.

But I'm afraid the current UCSC Genes description doesn't help me to really 
understand this too. At least it doesn't say that they only keep one mRNA 
aligment per protein, so this just might be the reason.

As i had some trouble to assign my peptides to some isoforms, i decided to have 
a look at all proteins/genes with the same protein ids anyways (including the 
isoforms). Then i just take the first gene my peptide matches to. As i'm only 
interessted in the peptides positions in the genome, this should be sufficient 
for me.

I'll have a look at knownGenePep table. It might improve my mapping performance 
but i think it won't help me with the protein gene relationship problem due to 
the identical protein ids. As my peptides are already assigned to a protein 
with an id, i use this id to restrict the peptides origin in the genome. So the 
AA sequences only are not useful  in my first steps. But as writen above, i 
solved this problem, at least sufficient enough for me.

So thank you again for your help.

Greetings, Mathias


-----Ursprüngliche Nachricht-----
Von: Greg Roe [mailto:[email protected]] 
Gesendet: Samstag, 13. August 2011 00:53
An: Kuhring, Mathias
Cc: [email protected]
Betreff: Re: [Genome] Different genes with same protein id

Hi Mathias,

The paper you're citing describes a previous version UCSC Gene build methods. 
Please read the current methods on the UCSC Genes description page: 
http://genome.ucsc.edu/cgi-bin/hgTrackUi?g=knownGene

If that doesn't clear up the reason why you're seeing multiple gene ids with 
the same protein ID, please let us know: [email protected]

Thoguh Fan Hsu pointed out that you may want to consider using the current 
knownGenePep table, which contains predicted AA sequences derived directly from 
the DNA sequences of UCSC Genes based on reference genome.

-
Greg Roe
UCSC Genome Bioinformatics Group 


On 8/8/11 12:56 AM, Kuhring, Mathias wrote: 

        Hi everybody,
        
        I downloaded the knownGene table with the table browser (default 
settings) to use the annotations for some protein mapping.
        
        Hsu's paper (The UCSC Known Genes, 2006) says "mRNA with the highest 
score is selected as the representative mRNA for the protein" and "removing 
duplicates having identical chromosome number, start and ending positions of 
coding sequence"
        
        So actually I expected one gene per protein (uniprot id) but i found a 
couple of genes coding for the same protein.
        This is causing some trouble, because now I'm not sure which one to 
take for my protein.
        
        The "redundant" (?) genes almost share the same loci and/or cds, but 
seem to differ in number of exons and/or splice sites.
        So I'm afraid they don't code for the same amino acid sequence, which i 
thought usally leads to different proteins (or are there many exceptions?).
        
        Here is an example (I attached some more):
        #name   chrom   strand  txStart txEnd   cdsStart        cdsEnd  
exonCount       exonStarts      exonEnds        proteinID       alignID
        uc001lqz.2      chr11   +       747431  765023  747481  764845  8       
747431,755878,758949,760121,763343,763746,764287,764812,        
747578,756002,759057,760253,763519,763944,764433,765023,        P37837  
uc001lqz.2
        uc001lra.2      chr11   +       747431  765023  747481  764413  8       
747431,755878,758949,760121,763343,763746,764287,764812,        
747578,756002,759057,760253,763519,763940,764433,765023,        P37837  
uc001lra.2
        
        The second gene's cdsEnd is smaller and exon 6 ends 4 positions earlier 
but is still in the cds.
        I had a look a the sequences and I there is a shift. So I think the 
propability to code for the same protein is pretty low.
        
        But how should I handle those genes now? Do they code for protein 
isoforms, which didn't get a unique protein id yet?
        May i got Hsu's paper wrong? Or did I just miss some information 
somewhere?
        
        I hope you can help me with this. Thanks a lot.
        
        Greetinz,
        Mathias

         
        
        _______________________________________________
        Genome maillist  -  [email protected]
        https://lists.soe.ucsc.edu/mailman/listinfo/genome


_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to