Dear UCSC Genome Browser Team A question from a user of my software (CC'ed) lead me to notice a potential bug in the UCSC Genome Table Browser.
According to the GFF specs, the value in the start column of a GFF or GTF file must never be larger than the value in the end column. However, the Table Browser does return such lines. Steps to reproduce: In the Table Browser, select the "NCBI37/mm9" assembly, the "UCSC Genes" track and the "known genes" table. As region, set "chr1:40547900-40548100", and requested "GTF" output format. The output contains the following line, describing the last exon of transcript 'uc007aug.1' (gene name Il18r1): chr1 mm9_knownGene CDS 40547903 40547900 0.000000 + 1 gene_id "uc007aug.1"; transcript_id "uc007aug.1"; In this line, the CDS seems to have negative length, the end is left of the start! The other transcripts of this gene do not have such a strange exon, rather, the exon seems to actually extend to 40548061. Also note the two lines following the faulty one: chr1 mm9_knownGene stop_codon 40547901 40547903 0.000000 + . gene_id "uc007aug.1"; transcript_id "uc007aug.1"; chr1 mm9_knownGene exon 40547903 40548425 0.000000 + . gene_id "uc007aug.1"; transcript_id "uc007aug.1"; A stop codon is listed that does not appear in the other transcripts of the same genes that contain this exon. For example, transcript uc007auh.1 (for which this exon is not final) has its open reading frame spanning the place of the erroneous stop codon: chr1 mm9_knownGene CDS 40547903 40548061 0.000000 + 2 gene_id "uc007auh.1"; transcript_id "uc007auh.1"; Paola (the user who stumbled over this when my script gave an error due to the end being before the start) wrote that she encountered 104 such lines in the entire mm9 GTF file. Could it be that you have some bug in the treatment of prematurely poly-adenylated transcripts? Best regards Simon Anders +--- | Dr. Simon Anders, Dipl.-Phys. | European Molecular Biology Laboratory (EMBL), Heidelberg | office phone +49-6221-387-8632 | preferred (permanent) e-mail: [email protected] _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
