Hi Yuan,

The fourth column of the BED output contains a lot of information 
separated by underscores. This is the first fourth column information 
from your example:

uc009vjk.2_cds_1_0_chr1_324343_f

The information displayed is summed up as follows:
ucscId_sequenceType_sequenceTypeNumber_basesAdded_chromosome_positionOfFirstBaseOfItem_strand

* UCSC ID - our identification for the transcripts in the UCSC Genes track
* Sequence Type - exons, intron, cds, utr5, etc; since you chose to see 
only coding exons, everything in your output should be cds
* Sequence Type number - for every transcript, there will be a row for 
each sequence type (cds or intron) and this identifies which is 
represented in this row;  the first is denoted with 0. So, if you 
requested exons, and a particular transcript has 10 exons, you will see 
a row for each one and in this position they will be numbered 0-9.
* Bases Added - this is the number of bases you specified should be 
added to the regions you requested (if 0, you didn't request additional 
bases added)
* Chromosome - this is the chromosome this item is on (same as 1st column)
* Position of First Base of Item (as would be displayed in the browser) 
- if a user had specified basses added to their requested features (for 
example, Exons plus 10 bases on each end) , then columns 2 and 3 of the 
output wouldn't be the exact coordinates of the exon, they would start 
and end 10 bases before/after the exon. So, this part of the information 
is an easy way to see where the actual feature starts as displayed in 
the browser. I say, "as displayed in the browser" because the 
coordinates in our tables almost always have 0-based starts (as they do 
in columns 2 and 3 of this output) but display as 1-based in the browser 
(for more info see this FAQ: 
http://genome.ucsc.edu/FAQ/FAQtracks.html#tracks1), but this start 
position listed in this section of the 4th column is actually 1 based. 
So, it will be the exact coordinate the feature starts on as displayed 
in the browser.
* Strand - forward strand (f) or reverse strand (-) (same as 6th column)

Please don't hesitate to contact the mail list again if you have any 
further questions.

Katrina Learned
UCSC Genome Bioinformatics Group




Yuan Hao wrote, On 03/04/11 15:24:
> Hi Luvian,
>
> May I have a relevant question that I've downloaded all the coding  
> exons from UCSC table browser which looks like the followings:
>
> chr1  324342  324345  uc009vjk.2_cds_1_0_chr1_324343_f        0       +
> chr1  324438  325605  uc009vjk.2_cds_2_0_chr1_324439_f        0       +
> chr1  324342  324345  uc001aau.2_cds_1_0_chr1_324343_f        0       +
> chr1  324438  325605  uc001aau.2_cds_2_0_chr1_324439_f        0       +
> chr1  367658  368594  uc010nxu.1_cds_0_0_chr1_367659_f        0       +
> chr1  621098  622034  uc010nxv.1_cds_0_0_chr1_621099_r        0       -
> chr1  664484  665108  uc001abe.3_cds_0_0_chr1_664485_r        0       -
> chr1  664484  665108  uc009vjm.2_cds_0_0_chr1_664485_r        0       -
>
> I have some difficulties to understand the 4th column of this file.
>
> 1) I presume 'uc*****.*' is the UCSC ids. By looking at the first four  
> records on the genome browser, they seem corresponding to the same  
> gene. Are they alias?
> 2) What do those numbers following 'cds_' mean? Are they indexing  
> exons of a gene?
> 3) After 'chr1', the number obviously represents some position  
> information, but I am not sure what position exactly? It looks 1bp  
> after the start position on the 2nd column. I interpret the 2nd & 3rd  
> column as the exon coordinates. Am I wrong?
>
> Thank you very much in advance!
>
> Yuan
>
> On 4 Mar 2011, at 21:11, Luvina Guruvadoo wrote:
>
>   
>> Hi Bogdan,
>>
>> Please see these two previously answered mailing list questions:
>>
>> https://lists.soe.ucsc.edu/pipermail/genome/2010-February/021412.html
>>
>> https://lists.soe.ucsc.edu/pipermail/genome/2007-August/014337.html
>>
>> If by non-redundant exons you mean that you don't want exons of splice
>> variants, then you may want to first obtain a list of transcripts from
>> the knownCanonical table using the Table Browser. To do this, select
>> "knownCanonical" from the table drop down menu, then "selected fields
>> from primary related tables" as the output format. Enter a file name  
>> in
>> the output file box and click "get output". On the following page,
>> select "transcripts" then "get output". This will provide you with a
>> file containing all transcript names. Then follow instructions on the
>> previous mailing list questions to obtain the exons, with the  
>> additional
>> step of uploading the list of transcripts (click "upload list" next to
>> identifiers).
>>
>> I hope this helps. Please contact us again at [email protected] if  
>> you
>> have any further questions.
>>
>> Best,
>> Luvina
>>
>> ---
>> Luvina Guruvadoo
>> UCSC Genome Bioinformatics Group
>>
>>
>>
>> Bogdan Tanasa wrote:
>>     
>>> Dear all,
>>>
>>> please could you let me know a way to retrieve the non-redundant  
>>> set of
>>> exons of UCSC genes of hg18.
>>>
>>> thanks,
>>>
>>> bogdan
>>> _______________________________________________
>>> Genome maillist  -  [email protected]
>>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>>>
>>>       
>> _______________________________________________
>> Genome maillist  -  [email protected]
>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>>     
>
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>   
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to