Hi, Martin, thanks for asking. That might add up to an awful lot of queries if you are using a human assembly. there are 1000s of tables in there. You might consider parsing the trackDb table first, because the entries there will give you in the "type" field the info you need to figure out which tables have those three fields. see the descriptions of the type fields in the documentation for custom tracks. In short, any of the BED tracks or bedGraphs will have chrom, chromStart and chomEnd, though there is at least one table grandfathered in with "genoName, genoStart, genoEnd" (the rmsk table). And some others use "tName, tStart, tEnd" (for target).
I believe you should consider putting a delay in your program so that it only qwueries the db every 15 seconds to give others a chance to get some resources. And maybe you should have only member of the class do the main queries to the db, then have the students all share the results locally. You might also consider limiting the number of records you are extracting from the tables to 100 or 100 or some such. Several tables are millions of rows large, including mysql> SELECT COUNT(*) FROM snp131; +----------+ | COUNT(*) | +----------+ | 26033053 | +----------+ mysql> SELECT COUNT(*) FROM snp132; +----------+ | COUNT(*) | +----------+ | 33026121 | +----------+ On 4/5/2011 12:44 PM, Martin Tompa wrote: > I am teaching an undergraduate, project-oriented Computational Biology > Capstone course at the University of Washington this term. The topic of this > term's project has to do with an analysis of the UCSC human genome browser > data, and could entail "excessive" queries to genome-mysql.cse.ucsc.edu > (discussed at http://genome.ucsc.edu/FAQ/FAQdownloads#download29). > > Here is the sort of thing we are considering doing initially: executing a > program that would find every hg19 table that contains the fields chrom, > chromStart, and chromEnd, and extracting those 3 columns from such tables. > > Do you view this as excessive? If so, do you have advice for how we should > proceed with this (and, later on this term, similar) queries of the database? > > I appreciate any help and guidance you can give us. > > Sincerely, > Martin Tompa > Department of Computer Science & Engineering > Department of Genome Sciences > University of Washington. > > > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
