Hi Ann, (1) Using our public mysql server, genome-mysql (see the "Direct MySQL access to data" FAQ for more details about using genome-mysql: http://genome-test.cse.ucsc.edu/FAQ/FAQdownloads.html#download29), as suggested by Brent Pedersen is a good option and it was kind of him to provide the commands. However, please be aware that we did not vet these commands; you'll have to review them and verify they suit your purposes.
Another option is to use the Table Browser, to get a set of *all* introns in one query and then get a set of *all* coding exons in another separate query. You'll then have to write a script to parse out the first intron for each gene and the first coding exon for each gene. From the Table Browser, select a gene track, set the output format to "sequence." On the "Select sequence type for UCSC Genes" page, select "genomic." On the " UCSC Genes Genomic Sequence" page, to get all introns you'll want to select "Introns" and "One FASTA record per region...". This will provide you a list of *all* introns. To get the all coding exons, you'll do nearly the same thing, except on the on the "UCSC Genes Genomic Sequence" page, you'll select "CDS Exons" instead of "Introns". This will provide you a list of *all* exons. You'll then have to write a script to extract the first intron/exon of each gene from each of your lists of results. Each region, intron or exon, in your results will start with ">database_table_geneId_#" and the number at the end will indicate the intron/exon number. For genes on the plus strand, the region numbered 0 is the first intron or exon (1 is the second, etc) of the gene. Here is an example: >hg19_knownGene_uc001aaa.3_0 For genes on the - strand, your script will have to determine which of the regions has the highest intron/exon number and pull that as the first intron or exon. (2) We don't have a public programs specifically for finding tandem repeats within microsatellite loci, but you could do an intersection between the Simple Repeat track and the Microsatellite track via the table browser. Please contact the mail list ([email protected]) again if you have any further questions. Katrina Learned UCSC Genome Bioinformatics Group On 8/25/11 9:07 PM, Ann Eileen Miller Baker wrote: > Brooke and others, > (1) Is there an alternative way to learn the first intron, first coding > exon? > (2) Does the Bioinformatics group have public programs for finding tandem > repeats within microsatellite loci? > Thanks for your help Brooke, > A > > On Thu, Aug 25, 2011 at 1:04 PM, Brooke Rhead<[email protected]> wrote: > >> Hello Ann, >> >> The Table Browser does not have an option to limit output to only the first >> intron or first coding exon. >> >> -- >> Brooke Rhead >> UCSC Genome Bioinformatics Group >> >> >> >> On 08/25/11 10:00, Ann Eileen Miller Baker wrote: >> >>> 25Au11 >>> Please answer below. I am aware that the table browser delivers >>> for mouse DMIT microsatellite loci overlapping introns, exons, coding >>> exons, >>> and UTR, but I am asking if there is any way to customize this listing to >>> include FIRST INTRON; FIRST CODING EXON. >>> Thanks, >>> Ann >>> >>> ---------- Forwarded message ---------- >>> From: Ann Eileen Miller Baker<[email protected]> >>> Date: Sun, Aug 21, 2011 at 3:45 PM >>> Subject: identifying "first intron", "first coding exon" >>> To:[email protected] >>> >>> >>> 21Au11 >>> Dear UCSC genomics team, >>> When determining genomic elements (UTR, exons, coding exons, introns) >>> co-occuring with DMIT microsatellite loci, >>> <<Is there a way to specify requesting first intron, first coding exon>>? >>> Thanks, >>> Ann >>> ______________________________**_________________ >>> Genome maillist [email protected] >>> https://lists.soe.ucsc.edu/**mailman/listinfo/genome<https://lists.soe.ucsc.edu/mailman/listinfo/genome> >>> > _______________________________________________ > Genome maillist [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
