hi, I have grabbed some data from mysql like this:
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D $ORG -P
3306 -e "select
chrom,txStart,txEnd,cdsStart,cdsEnd,K.name,X.geneSymbol,proteinID,strand,exonStarts,exonEnds
from knownGene as K,kgXref as X where X.kgId=K.name
I have a couple questions about the data. First, a row like this:
chrom txStart txEnd cdsStart cdsEnd name geneSymbol
proteinID strand exonStarts exonEnds
chr17 46103534 46115152 46103793 46115139
uc002imy.2 COPZ2 Q9P299 -
46103534,46105837,46106490,46109521,46110051,46110576,46111228,46114216,46115032,46115092,46115124,
46103841,46105876,46106542,46109599,46110107,46110668,46111310,46114291,46115092,46115122,46115152,
note that the 2nd-to-last exonStart is the same as the 3rd-from-last
exonEnd: 46115092. Does this mean a 0 length intron? And what does
that mean within a transcript?
Second question: for this same row; is it correct to infer that the
first exon in (0-based) bed format would be:
start=46103534, end=46103841
and the first intron would be:
start=46103841 end=46105837
but then the problem is that start == end in for the 0-length intron.
I have seen this: http://genome.ucsc.edu/FAQ/FAQtracks#tracks1 so the
internal format matches the BED format, correct?
thanks,
-Brent
_______________________________________________
Genome maillist - [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome