Hi Bob, Thanks. Yes, after actually having some maths, I also recognized that it is more complicated than I thought, especially in cases UTR intron (intron inside UTR regions). This also applies for coding regions as well, if there is any intron in themselves. One thing I also found out (and not quite understand) is that in case of non-coding genes, for example Mrpl15 - NR_033530 in mouse mm9:
Mrpl15 NR_033530 chr1 - 4763278 4775807 4775807 4775807 4 4763278,4767605,4772648,4775653, 4764597,4767729,4772814,4775807, I understand that this is non-coding gene, so there is no coding region for it. But instead of two empty cordinates at cdsStart and cdsEnd, we have two identical cordinates 4775807. Does that mean coding region size = 0 at 4775807 or it is just a convenient way for genePred format? In this case, how do I understand the differentiation between 3' UTR and 5' UTR? Does that mean 5' UTR size = 0 and 3' UTR is (4763278, 4775807) or both of them are the same and are (4763278, 4775807)? Thanks, D. On 3/23/11 3:25 AM, robert kuhn wrote: > Hi, again, Duke, > > I would additionally point out that what you have would not work for > the size of the UTRs if the UTR was split by an intron. In that case, > you would have to account for the intron as well. > > --b0b > > > On 3/22/2011 4:05 PM, robert kuhn wrote: >> Hello, Duke, >> >> It looks as if you understand it correctly, though I would offer that >> if you actually perform the subtractions you show, then you would >> get the size, not the coordinates. Though if you interpret the "-" >> in your message to mean the "through", then you have defined the >> interval >> properly, though in reverse. E.g., txEnd-cdsEnd should read "cdsEnd >> through >> txEnd" if you mean the interval, as the txEnd should always be >> greater than >> the cdsEnd. >> >> best wishes, >> >> --b0b kuhn >> ucsc genome bioinformatics group >> >> On 3/21/2011 7:29 AM, Duke wrote: >>> Hi folks, >>> >>> Please correct me if I am wrong. I am dealing with how to get the >>> cordinates of different genome regions such as >>> UTR/intergenic/intragenic etc... and from the genePred format >>> (http://genome.ucsc.edu/FAQ/FAQformat.html#format9), I think I can >>> get them like follow: >>> >>> If Strand = '+': >>> >>> 3UTR = txEnd-cdsEnd >>> 5UTR = cdsStart-txStart >>> Intragenic(i) = exonEnds(i)-exonStarts(i) >>> Intergenic = all regions that do not overlap with gene cordinates >>> (between txStart and txEnd) >>> >>> For Strand = '-', everything should be reversed, such as 3UTR = >>> cdStart-txStart etc... >>> >>> Thank you very much in advance, >>> >>> D. >>> _______________________________________________ >>> Genome maillist - [email protected] >>> https://lists.soe.ucsc.edu/mailman/listinfo/genome >> _______________________________________________ >> Genome maillist - [email protected] >> https://lists.soe.ucsc.edu/mailman/listinfo/genome > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
