Thanks Mary. Your information is very helpful.

Bests,

D.

On 3/23/11 7:52 PM, Mary Goldman wrote:
> Hi Duke,
>
> For non-coding genes (who, by definition, have a coding region size of 
> 0), cdsStart will always equal cdsEnd in the genePred format. Since 
> there is no coding region to indicate, it doesn't matter what the 
> actual genome coordinates are for the cdsStart and cdsEnd (just that 
> they are equal to each other). As a convention to help with 
> standardization, we have made the cdsStart equal the txtStart for 
> non-coding genes. Likewise, there are no UTRs (UnTranslated Regions) 
> for non-coding genes because there is no translated region (or coding 
> sequence).
>
> I hope this information is helpful.  Please feel free to contact the 
> mail list again if you require further assistance.
>
> Best,
> Mary
> ------------------
> Mary Goldman
> UCSC Bioinformatics Group
>
> On 3/23/11 6:59 AM, Duke wrote:
>> Hi Bob,
>>
>> Thanks. Yes, after actually having some maths, I also recognized that it
>> is more complicated than I thought, especially in cases UTR intron
>> (intron inside UTR regions). This also applies for coding regions as
>> well, if there is any intron in themselves. One thing I also found out
>> (and not quite understand) is that in case of non-coding genes, for
>> example Mrpl15 - NR_033530 in mouse mm9:
>>
>> Mrpl15    NR_033530    chr1    -    4763278    4775807    4775807
>> 4775807    4    4763278,4767605,4772648,4775653,
>> 4764597,4767729,4772814,4775807,
>>
>> I understand that this is non-coding gene, so there is no coding region
>> for it. But instead of two empty cordinates at cdsStart and cdsEnd, we
>> have two identical cordinates 4775807. Does that mean coding region size
>> = 0 at 4775807 or it is just a convenient way for genePred format? In
>> this case, how do I understand the differentiation between 3' UTR and 5'
>> UTR? Does that mean 5' UTR size = 0 and 3' UTR is (4763278, 4775807) or
>> both of them are the same and are (4763278, 4775807)?
>>
>> Thanks,
>>
>> D.
>>
>> On 3/23/11 3:25 AM, robert kuhn wrote:
>>> Hi, again, Duke,
>>>
>>> I would additionally point out that what you have would not work for
>>> the size of the UTRs if the UTR was split by an intron.  In that case,
>>> you would have to account for the intron as well.
>>>
>>>              --b0b
>>>
>>>
>>> On 3/22/2011 4:05 PM, robert kuhn wrote:
>>>> Hello, Duke,
>>>>
>>>> It looks as if you understand it correctly, though I would offer that
>>>> if you actually perform the subtractions you show, then you would
>>>> get the size, not the coordinates.  Though if you interpret the "-"
>>>> in your message to mean the "through", then you have defined the
>>>> interval
>>>> properly, though in reverse.  E.g., txEnd-cdsEnd should read "cdsEnd
>>>> through
>>>> txEnd" if you mean the interval, as the txEnd should always be
>>>> greater than
>>>> the cdsEnd.
>>>>
>>>> best wishes,
>>>>
>>>>              --b0b kuhn
>>>>              ucsc genome bioinformatics group
>>>>
>>>> On 3/21/2011 7:29 AM, Duke wrote:
>>>>> Hi folks,
>>>>>
>>>>> Please correct me if I am wrong. I am dealing with how to get the
>>>>> cordinates of different genome regions such as
>>>>> UTR/intergenic/intragenic etc... and from the genePred format
>>>>> (http://genome.ucsc.edu/FAQ/FAQformat.html#format9), I think I can
>>>>> get them like follow:
>>>>>
>>>>> If Strand = '+':
>>>>>
>>>>> 3UTR = txEnd-cdsEnd
>>>>> 5UTR = cdsStart-txStart
>>>>> Intragenic(i) = exonEnds(i)-exonStarts(i)
>>>>> Intergenic = all regions that do not overlap with gene cordinates
>>>>> (between txStart and txEnd)
>>>>>
>>>>> For Strand = '-', everything should be reversed, such as 3UTR =
>>>>> cdStart-txStart etc...
>>>>>
>>>>> Thank you very much in advance,
>>>>>
>>>>> D.
>>>>> _______________________________________________
>>>>> Genome maillist  -  [email protected]
>>>>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>>>> _______________________________________________
>>>> Genome maillist  -  [email protected]
>>>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>> _______________________________________________
>> Genome maillist  -  [email protected]
>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to