Hi,

I found some strange lines in the GTF output that can be retrieved from 
the Table Browser.

In a file downloaded end of 2010 there were lines with start > end. This 
was reported about a year ago in the mailing list with this subject: 
"Table Browser may return GTF lines with start > end" and I couldn't 
find any such lines when downloading the same data today.
However, when looking at one of the same regions that had a CDS with 
start > end before, I noticed that there is now a strange entry for the 
stop codon.

I looked up the following region in the table browser:

Mammal-Human-hg18
Genes and Gene Prediction Tracks - RefSeq Genes
table: refGene

position:  chr1:92537110-92626320

output format: GTF

The relevant part is this:

chr1    hg18_refGene    CDS    92618869    92619016    0.000000    +    
1    gene_id "NM_024813"; transcript_id "NM_024813";
chr1    hg18_refGene    exon    92618869    92619018    0.000000    +    
.    gene_id "NM_024813"; transcript_id "NM_024813";
chr1    hg18_refGene    stop_codon    92619017    92625156    
0.000000    +    .    gene_id "NM_024813"; transcript_id "NM_024813";
chr1    hg18_refGene    exon    92625156    92626320    0.000000    +    
.    gene_id "NM_024813"; transcript_id "NM_024813";

Please note that the stop_codon entry has a length of almost 6000 bases, 
of which the majority is not within one of the exons of this transcript. 
This might be caused by the fact that the stop codon is divided by the 
splice site.

The GTF annotation allows for spliced stop codon, see here 
(http://mblab.wustl.edu/GTF22.html):

"The "start_codon" and "stop_codon" features are not required to be 
atomic; they may be interrupted by valid splice sites. "

I guess that the correct thing would be to insert two stop_codon entries 
instead of one, but there might be reasons to keep it that way.

Thanks and best regards,
Hilmar


_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to