Hi Hilmar, Thank you for bringing this error to our attention! It is fixed on our development server and will be fixed on genome.ucsc.edu with the next software release (in about two weeks).
Angie ----- "Hilmar Berger" <[email protected]> wrote: > From: "Hilmar Berger" <[email protected]> > To: [email protected] > Sent: Friday, May 27, 2011 1:43:38 AM GMT -08:00 US/Canada Pacific > Subject: [Genome] Huge stop codon entry in GTF output > > Hi, > > I found some strange lines in the GTF output that can be retrieved from > the Table Browser. > > In a file downloaded end of 2010 there were lines with start > end. This > was reported about a year ago in the mailing list with this subject: > "Table Browser may return GTF lines with start > end" and I couldn't > find any such lines when downloading the same data today. > However, when looking at one of the same regions that had a CDS with > start > end before, I noticed that there is now a strange entry for the > stop codon. > > I looked up the following region in the table browser: > > Mammal-Human-hg18 > Genes and Gene Prediction Tracks - RefSeq Genes > table: refGene > > position: chr1:92537110-92626320 > > output format: GTF > > The relevant part is this: > > chr1 hg18_refGene CDS 92618869 92619016 0.000000 + > > 1 gene_id "NM_024813"; transcript_id "NM_024813"; > chr1 hg18_refGene exon 92618869 92619018 0.000000 + > > . gene_id "NM_024813"; transcript_id "NM_024813"; > chr1 hg18_refGene stop_codon 92619017 92625156 > 0.000000 + . gene_id "NM_024813"; transcript_id "NM_024813"; > chr1 hg18_refGene exon 92625156 92626320 0.000000 + > > . gene_id "NM_024813"; transcript_id "NM_024813"; > > Please note that the stop_codon entry has a length of almost 6000 bases, > of which the majority is not within one of the exons of this transcript. > This might be caused by the fact that the stop codon is divided by the > splice site. > > The GTF annotation allows for spliced stop codon, see here > (http://mblab.wustl.edu/GTF22.html): > > "The "start_codon" and "stop_codon" features are not required to be > atomic; they may be interrupted by valid splice sites. " > > I guess that the correct thing would be to insert two stop_codon entries > instead of one, but there might be reasons to keep it that way. > > Thanks and best regards, > Hilmar > > > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
