hi, I have grabbed some data from mysql like this:

mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D $ORG -P
3306   -e "select
chrom,txStart,txEnd,cdsStart,cdsEnd,K.name,X.geneSymbol,proteinID,strand,exonStarts,exonEnds
from knownGene as K,kgXref as X where  X.kgId=K.name

I have a couple questions about the data. First, a row like this:

chrom   txStart txEnd   cdsStart        cdsEnd  name    geneSymbol      
proteinID       strand  exonStarts      exonEnds
chr17    46103534       46115152        46103793        46115139        
uc002imy.2      COPZ2   Q9P299  -       
46103534,46105837,46106490,46109521,46110051,46110576,46111228,46114216,46115032,46115092,46115124,
     
46103841,46105876,46106542,46109599,46110107,46110668,46111310,46114291,46115092,46115122,46115152,

note that the 2nd-to-last exonStart is the same as the 3rd-from-last
exonEnd: 46115092. Does this mean a 0 length intron? And what does
that mean within a transcript?

Second question: for this same row; is it correct to infer that the
first exon in (0-based) bed format would be:
 start=46103534, end=46103841
and the first intron would be:
 start=46103841 end=46105837

but then the problem is that start == end in for the 0-length intron.

I have seen this: http://genome.ucsc.edu/FAQ/FAQtracks#tracks1 so the
internal format matches the BED format, correct?

thanks,
-Brent

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to