Hi Bob,

Thanks. Yes, after actually having some maths, I also recognized that it 
is more complicated than I thought, especially in cases UTR intron 
(intron inside UTR regions). This also applies for coding regions as 
well, if there is any intron in themselves. One thing I also found out 
(and not quite understand) is that in case of non-coding genes, for 
example Mrpl15 - NR_033530 in mouse mm9:

Mrpl15    NR_033530    chr1    -    4763278    4775807    4775807    
4775807    4    4763278,4767605,4772648,4775653,    
4764597,4767729,4772814,4775807,

I understand that this is non-coding gene, so there is no coding region 
for it. But instead of two empty cordinates at cdsStart and cdsEnd, we 
have two identical cordinates 4775807. Does that mean coding region size 
= 0 at 4775807 or it is just a convenient way for genePred format? In 
this case, how do I understand the differentiation between 3' UTR and 5' 
UTR? Does that mean 5' UTR size = 0 and 3' UTR is (4763278, 4775807) or 
both of them are the same and are (4763278, 4775807)?

Thanks,

D.

On 3/23/11 3:25 AM, robert kuhn wrote:
> Hi, again, Duke,
>
> I would additionally point out that what you have would not work for
> the size of the UTRs if the UTR was split by an intron.  In that case,
> you would have to account for the intron as well.
>
>             --b0b
>
>
> On 3/22/2011 4:05 PM, robert kuhn wrote:
>> Hello, Duke,
>>
>> It looks as if you understand it correctly, though I would offer that
>> if you actually perform the subtractions you show, then you would
>> get the size, not the coordinates.  Though if you interpret the "-"
>> in your message to mean the "through", then you have defined the 
>> interval
>> properly, though in reverse.  E.g., txEnd-cdsEnd should read "cdsEnd 
>> through
>> txEnd" if you mean the interval, as the txEnd should always be 
>> greater than
>> the cdsEnd.
>>
>> best wishes,
>>
>>             --b0b kuhn
>>             ucsc genome bioinformatics group
>>
>> On 3/21/2011 7:29 AM, Duke wrote:
>>> Hi folks,
>>>
>>> Please correct me if I am wrong. I am dealing with how to get the 
>>> cordinates of different genome regions such as 
>>> UTR/intergenic/intragenic etc... and from the genePred format 
>>> (http://genome.ucsc.edu/FAQ/FAQformat.html#format9), I think I can 
>>> get them like follow:
>>>
>>> If Strand = '+':
>>>
>>> 3UTR = txEnd-cdsEnd
>>> 5UTR = cdsStart-txStart
>>> Intragenic(i) = exonEnds(i)-exonStarts(i)
>>> Intergenic = all regions that do not overlap with gene cordinates 
>>> (between txStart and txEnd)
>>>
>>> For Strand = '-', everything should be reversed, such as 3UTR = 
>>> cdStart-txStart etc...
>>>
>>> Thank you very much in advance,
>>>
>>> D.
>>> _______________________________________________
>>> Genome maillist  -  [email protected]
>>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>> _______________________________________________
>> Genome maillist  -  [email protected]
>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to