On Sun, May 22, 2011 at 8:02 AM, Brent Pedersen <[email protected]> wrote:
> hi, I have grabbed some data from mysql like this:
>
> mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D $ORG -P
> 3306   -e "select
> chrom,txStart,txEnd,cdsStart,cdsEnd,K.name,X.geneSymbol,proteinID,strand,exonStarts,exonEnds
> from knownGene as K,kgXref as X where  X.kgId=K.name
>
> I have a couple questions about the data. First, a row like this:
>
> chrom   txStart txEnd   cdsStart        cdsEnd  name    geneSymbol      
> proteinID       strand  exonStarts      exonEnds
> chr17    46103534       46115152        46103793        46115139        
> uc002imy.2      COPZ2   Q9P299  -       
> 46103534,46105837,46106490,46109521,46110051,46110576,46111228,46114216,46115032,46115092,46115124,
>      
> 46103841,46105876,46106542,46109599,46110107,46110668,46111310,46114291,46115092,46115122,46115152,
>
> note that the 2nd-to-last exonStart is the same as the 3rd-from-last
> exonEnd: 46115092. Does this mean a 0 length intron? And what does
> that mean within a transcript?

Can anyone comment on this? Is there some way I can clarify the question?
I find cases like this 185 times in hg19
Thanks,
-Brent


>
> Second question: for this same row; is it correct to infer that the
> first exon in (0-based) bed format would be:
>  start=46103534, end=46103841
> and the first intron would be:
>  start=46103841 end=46105837
>
> but then the problem is that start == end in for the 0-length intron.
>
> I have seen this: http://genome.ucsc.edu/FAQ/FAQtracks#tracks1 so the
> internal format matches the BED format, correct?
>
> thanks,
> -Brent
>

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to