Hi Yuan, The fourth column of the BED output contains a lot of information separated by underscores. This is the first fourth column information from your example:
uc009vjk.2_cds_1_0_chr1_324343_f The information displayed is summed up as follows: ucscId_sequenceType_sequenceTypeNumber_basesAdded_chromosome_positionOfFirstBaseOfItem_strand * UCSC ID - our identification for the transcripts in the UCSC Genes track * Sequence Type - exons, intron, cds, utr5, etc; since you chose to see only coding exons, everything in your output should be cds * Sequence Type number - for every transcript, there will be a row for each sequence type (cds or intron) and this identifies which is represented in this row; the first is denoted with 0. So, if you requested exons, and a particular transcript has 10 exons, you will see a row for each one and in this position they will be numbered 0-9. * Bases Added - this is the number of bases you specified should be added to the regions you requested (if 0, you didn't request additional bases added) * Chromosome - this is the chromosome this item is on (same as 1st column) * Position of First Base of Item (as would be displayed in the browser) - if a user had specified basses added to their requested features (for example, Exons plus 10 bases on each end) , then columns 2 and 3 of the output wouldn't be the exact coordinates of the exon, they would start and end 10 bases before/after the exon. So, this part of the information is an easy way to see where the actual feature starts as displayed in the browser. I say, "as displayed in the browser" because the coordinates in our tables almost always have 0-based starts (as they do in columns 2 and 3 of this output) but display as 1-based in the browser (for more info see this FAQ: http://genome.ucsc.edu/FAQ/FAQtracks.html#tracks1), but this start position listed in this section of the 4th column is actually 1 based. So, it will be the exact coordinate the feature starts on as displayed in the browser. * Strand - forward strand (f) or reverse strand (-) (same as 6th column) Please don't hesitate to contact the mail list again if you have any further questions. Katrina Learned UCSC Genome Bioinformatics Group Yuan Hao wrote, On 03/04/11 15:24: > Hi Luvian, > > May I have a relevant question that I've downloaded all the coding > exons from UCSC table browser which looks like the followings: > > chr1 324342 324345 uc009vjk.2_cds_1_0_chr1_324343_f 0 + > chr1 324438 325605 uc009vjk.2_cds_2_0_chr1_324439_f 0 + > chr1 324342 324345 uc001aau.2_cds_1_0_chr1_324343_f 0 + > chr1 324438 325605 uc001aau.2_cds_2_0_chr1_324439_f 0 + > chr1 367658 368594 uc010nxu.1_cds_0_0_chr1_367659_f 0 + > chr1 621098 622034 uc010nxv.1_cds_0_0_chr1_621099_r 0 - > chr1 664484 665108 uc001abe.3_cds_0_0_chr1_664485_r 0 - > chr1 664484 665108 uc009vjm.2_cds_0_0_chr1_664485_r 0 - > > I have some difficulties to understand the 4th column of this file. > > 1) I presume 'uc*****.*' is the UCSC ids. By looking at the first four > records on the genome browser, they seem corresponding to the same > gene. Are they alias? > 2) What do those numbers following 'cds_' mean? Are they indexing > exons of a gene? > 3) After 'chr1', the number obviously represents some position > information, but I am not sure what position exactly? It looks 1bp > after the start position on the 2nd column. I interpret the 2nd & 3rd > column as the exon coordinates. Am I wrong? > > Thank you very much in advance! > > Yuan > > On 4 Mar 2011, at 21:11, Luvina Guruvadoo wrote: > > >> Hi Bogdan, >> >> Please see these two previously answered mailing list questions: >> >> https://lists.soe.ucsc.edu/pipermail/genome/2010-February/021412.html >> >> https://lists.soe.ucsc.edu/pipermail/genome/2007-August/014337.html >> >> If by non-redundant exons you mean that you don't want exons of splice >> variants, then you may want to first obtain a list of transcripts from >> the knownCanonical table using the Table Browser. To do this, select >> "knownCanonical" from the table drop down menu, then "selected fields >> from primary related tables" as the output format. Enter a file name >> in >> the output file box and click "get output". On the following page, >> select "transcripts" then "get output". This will provide you with a >> file containing all transcript names. Then follow instructions on the >> previous mailing list questions to obtain the exons, with the >> additional >> step of uploading the list of transcripts (click "upload list" next to >> identifiers). >> >> I hope this helps. Please contact us again at [email protected] if >> you >> have any further questions. >> >> Best, >> Luvina >> >> --- >> Luvina Guruvadoo >> UCSC Genome Bioinformatics Group >> >> >> >> Bogdan Tanasa wrote: >> >>> Dear all, >>> >>> please could you let me know a way to retrieve the non-redundant >>> set of >>> exons of UCSC genes of hg18. >>> >>> thanks, >>> >>> bogdan >>> _______________________________________________ >>> Genome maillist - [email protected] >>> https://lists.soe.ucsc.edu/mailman/listinfo/genome >>> >>> >> _______________________________________________ >> Genome maillist - [email protected] >> https://lists.soe.ucsc.edu/mailman/listinfo/genome >> > > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
