Dear Dr. Jackson, Thank you very much for your detail answer! That is very helpful to me.
Do you think I need download the knowGene data and write a program to identify non-coding genes by selecting those genes where the cdsStart == cdsEnd through? For I failed to do by using the Table browser? Thank you again! Best regards, Chuangye 2009-11-25 2009/11/25, Jennifer Jackson <[email protected]>: > Hello Chuangye, > > The browser tools option for EST data is to extract the associated genomic > for the intron regions (the genomic region covered by the EST alignment. > There is not an automated way to extract to actual EST sequence. To download > these genomic regions, use the Table browser and use the following steps: > > 1) go to http://genome.ucsc.edu/cgi-bin/hgTables > 2) set controls to the assembly of interest and track group to "mRna and Est > tracks" > 3) choose either intronEST or just EST > 4) then select output options as sequence. name file and submit. > 5) at the next output details page, choose the option "Regions between > blocks". A block is a technical name we use for an exon - basically any > contiguous alignment section versus genomic sequence. Whether or not a block > is actually an exon will depend on the quality of the data being aligned. > For Est data, this can vary. > 6) download regions. The set will be a mix of regions bounded by coding or > non-coding exons. > > Ests are not annotated as coding or non-coding, but you can use a gene track > (UCSC genes or RefSeq genes or other) to extract genomic intron regions. > Follow the same method above, starting with the gene track, select genomic > sequence, then Introns. > > Non-coding genes can be identified by selecting those genes where the > cdsStart == cdsEnd. This is how we designate non-coding genes. An example of > this is the data from the UCSC Genes track (knownGenes table) where name == > uc001aaa.2. Use the assembly or table browser and search using this > identifier to view the example. > > To locate ESTs associated with non-coding genes, create a custom track that > contains only the non-coding genes, then start a Table browser query > starting with the EST track. Set an intersection (overlap) against this > custom track and output the data. Only ESTs that align to the same region of > genomic as the non-coding gene will be returned in the result. > > Spliced Ests are those that contain verified splice sites at the alignemnt > block boundaries and the data is in the Spliced ESTs track (table = > intronEst). Ests that do not have splice sites are joined together with the > first set in the EST track (table = all_est). These tables can be quite > large, so for assemblies with a large number of ESTs either extract the data > per region or chromosome or consider using the files on Download and your > own tools to parse out the data from the fasta files based on the block > coordinates in the tables. > > Read the track descriptions for more details about how the data is > classified. To do this, go into the Assembly browser and click on the track > name. > > More help: > http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html > (general help and example queries) > http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#Download > (this link has the actual EST sequences - which you could potentially parse > using the your own tools and the coordinate data from the tables in the > mySQL database). > > To find the files associated with any mySQL table, go into Downloads, use > the links to locate the assembly, then go into the Annotation Database > directory. All tables are here - the files are named the same as the tables > with either a .txt.gz at the end for the data and an .sql for the schema. > Ftp to use locally. > > We hope this helps, > Jennifer Jackson > > ------------------------------------------------ > Jennifer Jackson > UCSC Genome Bioinformatics Group > > ----- "Chuangye" <[email protected]> wrote: > >> From: "Chuangye" <[email protected]> >> To: [email protected] >> Sent: Tuesday, November 24, 2009 6:04:59 PM GMT -08:00 US/Canada Pacific >> Subject: [Genome] HOW to get noncoding EST and the intron part of >> protein-coding EST >> >> Hello, Sir/Miss, >> >> How could I get noncoding EST(ncRNA) and the intron part of >> protein-coding EST from UCSC Genome Browser? What the differences of >> the "all_est" table between "Spliced Ests" track and "Human ESTs" >> track? And are the "intronEST" of "Spliced Ests" track introns of >> genes? >> >> Thanks! >> >> Chuangye >> >> 2009-11-24 >> _______________________________________________ >> Genome maillist - [email protected] >> https://lists.soe.ucsc.edu/mailman/listinfo/genome > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
