Hello, The files on Downloads are always an option, but I was able to extract just the non-coding genes from the knownGene table by using the filter function and adding into the free-form query box this:
cdsStart = cdsEnd Try this and see if it works for you. Jennifer ------------------------------------------------ Jennifer Jackson UCSC Genome Bioinformatics Group ----- "Chuangye" <[email protected]> wrote: > From: "Chuangye" <[email protected]> > To: "Jennifer Jackson" <[email protected]> > Cc: [email protected] > Sent: Wednesday, November 25, 2009 7:10:58 AM GMT -08:00 US/Canada Pacific > Subject: Re: [Genome] HOW to get noncoding EST and the intron part of > protein-coding EST > > Dear Dr. Jackson, > > Thank you very much for your detail answer! That is very helpful to > me. > > Do you think I need download the knowGene data and write a program to > identify non-coding genes by selecting those genes where the cdsStart > == cdsEnd through? For I failed to do by using the Table browser? > > Thank you again! > > Best regards, > > Chuangye > 2009-11-25 > > > 2009/11/25, Jennifer Jackson <[email protected]>: > > Hello Chuangye, > > > > The browser tools option for EST data is to extract the associated > genomic > > for the intron regions (the genomic region covered by the EST > alignment. > > There is not an automated way to extract to actual EST sequence. To > download > > these genomic regions, use the Table browser and use the following > steps: > > > > 1) go to http://genome.ucsc.edu/cgi-bin/hgTables > > 2) set controls to the assembly of interest and track group to "mRna > and Est > > tracks" > > 3) choose either intronEST or just EST > > 4) then select output options as sequence. name file and submit. > > 5) at the next output details page, choose the option "Regions > between > > blocks". A block is a technical name we use for an exon - basically > any > > contiguous alignment section versus genomic sequence. Whether or not > a block > > is actually an exon will depend on the quality of the data being > aligned. > > For Est data, this can vary. > > 6) download regions. The set will be a mix of regions bounded by > coding or > > non-coding exons. > > > > Ests are not annotated as coding or non-coding, but you can use a > gene track > > (UCSC genes or RefSeq genes or other) to extract genomic intron > regions. > > Follow the same method above, starting with the gene track, select > genomic > > sequence, then Introns. > > > > Non-coding genes can be identified by selecting those genes where > the > > cdsStart == cdsEnd. This is how we designate non-coding genes. An > example of > > this is the data from the UCSC Genes track (knownGenes table) where > name == > > uc001aaa.2. Use the assembly or table browser and search using this > > identifier to view the example. > > > > To locate ESTs associated with non-coding genes, create a custom > track that > > contains only the non-coding genes, then start a Table browser > query > > starting with the EST track. Set an intersection (overlap) against > this > > custom track and output the data. Only ESTs that align to the same > region of > > genomic as the non-coding gene will be returned in the result. > > > > Spliced Ests are those that contain verified splice sites at the > alignemnt > > block boundaries and the data is in the Spliced ESTs track (table = > > intronEst). Ests that do not have splice sites are joined together > with the > > first set in the EST track (table = all_est). These tables can be > quite > > large, so for assemblies with a large number of ESTs either extract > the data > > per region or chromosome or consider using the files on Download and > your > > own tools to parse out the data from the fasta files based on the > block > > coordinates in the tables. > > > > Read the track descriptions for more details about how the data is > > classified. To do this, go into the Assembly browser and click on > the track > > name. > > > > More help: > > http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html > > (general help and example queries) > > http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#Download > > (this link has the actual EST sequences - which you could > potentially parse > > using the your own tools and the coordinate data from the tables in > the > > mySQL database). > > > > To find the files associated with any mySQL table, go into > Downloads, use > > the links to locate the assembly, then go into the Annotation > Database > > directory. All tables are here - the files are named the same as the > tables > > with either a .txt.gz at the end for the data and an .sql for the > schema. > > Ftp to use locally. > > > > We hope this helps, > > Jennifer Jackson > > > > ------------------------------------------------ > > Jennifer Jackson > > UCSC Genome Bioinformatics Group > > > > ----- "Chuangye" <[email protected]> wrote: > > > >> From: "Chuangye" <[email protected]> > >> To: [email protected] > >> Sent: Tuesday, November 24, 2009 6:04:59 PM GMT -08:00 US/Canada > Pacific > >> Subject: [Genome] HOW to get noncoding EST and the intron part of > >> protein-coding EST > >> > >> Hello, Sir/Miss, > >> > >> How could I get noncoding EST(ncRNA) and the intron part of > >> protein-coding EST from UCSC Genome Browser? What the differences > of > >> the "all_est" table between "Spliced Ests" track and "Human ESTs" > >> track? And are the "intronEST" of "Spliced Ests" track introns > of > >> genes? > >> > >> Thanks! > >> > >> Chuangye > >> > >> 2009-11-24 > >> _______________________________________________ > >> Genome maillist - [email protected] > >> https://lists.soe.ucsc.edu/mailman/listinfo/genome > > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
