Re: [Genome] HOW to get noncoding EST and the intron part of protein-coding EST

Jennifer Jackson Wed, 25 Nov 2009 08:30:32 -0800

Hello,

The files on Downloads are always an option, but I was able to extract just the 
non-coding genes from the knownGene table by using the filter function and 
adding into the free-form query box this:


cdsStart = cdsEnd

Try this and see if it works for you. 

Jennifer

------------------------------------------------ 
Jennifer Jackson 
UCSC Genome Bioinformatics Group 

----- "Chuangye" <[email protected]> wrote:

> From: "Chuangye" <[email protected]>
> To: "Jennifer Jackson" <[email protected]>
> Cc: [email protected]
> Sent: Wednesday, November 25, 2009 7:10:58 AM GMT -08:00 US/Canada Pacific
> Subject: Re: [Genome] HOW to get noncoding EST and the intron part of  
> protein-coding EST
>
> Dear Dr. Jackson,
> 
> Thank you very much for your detail answer! That is very helpful to
> me.
> 
> Do you think I need download the knowGene data and write a program to
> identify non-coding genes by selecting those genes where the cdsStart
> == cdsEnd through? For I failed to do by using the Table browser?
> 
> Thank you again!
> 
> Best regards,
> 
> Chuangye
> 2009-11-25
> 
> 
> 2009/11/25, Jennifer Jackson <[email protected]>:
> > Hello Chuangye,
> >
> > The browser tools option for EST data is to extract the associated
> genomic
> > for the intron regions (the genomic region covered by the EST
> alignment.
> > There is not an automated way to extract to actual EST sequence. To
> download
> > these genomic regions, use the Table browser and use the following
> steps:
> >
> > 1) go to http://genome.ucsc.edu/cgi-bin/hgTables
> > 2) set controls to the assembly of interest and track group to "mRna
> and Est
> > tracks"
> > 3) choose either intronEST or just EST
> > 4) then select output options as sequence. name file and submit.
> > 5) at the next output details page, choose the option "Regions
> between
> > blocks". A block is a technical name we use for an exon - basically
> any
> > contiguous alignment section versus genomic sequence. Whether or not
> a block
> > is actually an exon will depend on the quality of the data being
> aligned.
> > For Est data, this can vary.
> > 6) download regions. The set will be a mix of regions bounded by
> coding or
> > non-coding exons.
> >
> > Ests are not annotated as coding or non-coding, but you can use a
> gene track
> > (UCSC genes or RefSeq genes or other) to extract genomic intron
> regions.
> > Follow the same method above, starting with the gene track, select
> genomic
> > sequence, then Introns.
> >
> > Non-coding genes can be identified by selecting those genes where
> the
> > cdsStart == cdsEnd. This is how we designate non-coding genes. An
> example of
> > this is the data from the UCSC Genes track (knownGenes table) where
> name ==
> > uc001aaa.2. Use the assembly or table browser and search using this
> > identifier to view the example.
> >
> > To locate ESTs associated with non-coding genes, create a custom
> track that
> > contains only the non-coding genes, then start a Table browser
> query
> > starting with the EST track. Set an intersection (overlap) against
> this
> > custom track and output the data. Only ESTs that align to the same
> region of
> > genomic as the non-coding gene will be returned in the result.
> >
> > Spliced Ests are those that contain verified splice sites at the
> alignemnt
> > block boundaries and the data is in the Spliced ESTs track (table =
> > intronEst). Ests that do not have splice sites are joined together
> with the
> > first set in the EST track (table = all_est). These tables can be
> quite
> > large, so for assemblies with a large number of ESTs either extract
> the data
> > per region or chromosome or consider using the files on Download and
> your
> > own tools to parse out the data from the fasta files based on the
> block
> > coordinates in the tables.
> >
> > Read the track descriptions for more details about how the data is
> > classified. To do this, go into the Assembly browser and click on
> the track
> > name.
> >
> > More help:
> > http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html
> > (general help and example queries)
> > http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#Download
> > (this link has the actual EST sequences - which you could
> potentially parse
> > using the your own tools and the coordinate data from the tables in
> the
> > mySQL database).
> >
> > To find the files associated with any mySQL table, go into
> Downloads, use
> > the links to locate the assembly, then go into the Annotation
> Database
> > directory. All tables are here - the files are named the same as the
> tables
> > with either a .txt.gz at the end for the data and an .sql for the
> schema.
> > Ftp to use locally.
> >
> > We hope this helps,
> > Jennifer Jackson
> >
> > ------------------------------------------------
> > Jennifer Jackson
> > UCSC Genome Bioinformatics Group
> >
> > ----- "Chuangye" <[email protected]> wrote:
> >
> >> From: "Chuangye" <[email protected]>
> >> To: [email protected]
> >> Sent: Tuesday, November 24, 2009 6:04:59 PM GMT -08:00 US/Canada
> Pacific
> >> Subject: [Genome] HOW to get noncoding EST and the intron part of
> >> protein-coding EST
> >>
> >> Hello, Sir/Miss,
> >>
> >> How could I  get noncoding EST(ncRNA) and the  intron part of
> >> protein-coding EST from UCSC Genome Browser? What the differences
> of
> >> the "all_est" table between "Spliced Ests" track and "Human ESTs"
> >> track?  And are the  "intronEST"  of "Spliced Ests" track introns
> of
> >> genes?
> >>
> >> Thanks!
> >>
> >> Chuangye
> >>
> >> 2009-11-24
> >> _______________________________________________
> >> Genome maillist  -  [email protected]
> >> https://lists.soe.ucsc.edu/mailman/listinfo/genome
> >
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Re: [Genome] HOW to get noncoding EST and the intron part of protein-coding EST

Reply via email to