Hi, Martin,

thanks for asking.  That might add up to an awful lot of queries if
you are using a human assembly.  there are 1000s of tables in there.
You might consider parsing the trackDb table first, because the entries
there will give you in the "type" field the info you need to figure out
which tables have those three fields.  see the descriptions of the type
fields in the documentation for custom tracks.  In short, any of the BED
tracks or bedGraphs will have chrom, chromStart and chomEnd, though
there is at least one table grandfathered in with "genoName, genoStart,
genoEnd" (the rmsk table).  And some others use "tName, tStart, tEnd"
(for target).

I believe you should consider putting a delay in your program so that it
only qwueries the db every 15 seconds to give others a chance to get
some resources.  And maybe you should have only member of the class
do the main queries to the db, then have the students all share the
results locally.

You might also consider limiting the number of records you are
extracting from the tables to 100 or 100 or some such.  Several tables
are millions of rows large, including
mysql> SELECT COUNT(*) FROM snp131;
+----------+
| COUNT(*) |
+----------+
| 26033053 |
+----------+

mysql> SELECT COUNT(*) FROM snp132;
+----------+
| COUNT(*) |
+----------+
| 33026121 |
+----------+






On 4/5/2011 12:44 PM, Martin Tompa wrote:
> I am teaching an undergraduate, project-oriented Computational Biology 
> Capstone course at the University of Washington this term.  The topic of this 
> term's  project has to do with an analysis of the UCSC human genome browser 
> data, and could entail "excessive" queries to genome-mysql.cse.ucsc.edu
> (discussed at http://genome.ucsc.edu/FAQ/FAQdownloads#download29).
> 
> Here is the sort of thing we are considering doing initially: executing a 
> program that would find every hg19 table that contains the fields chrom, 
> chromStart, and chromEnd, and extracting those 3 columns from such tables.
> 
> Do you view this as excessive?  If so, do you have advice for how we should 
> proceed with this (and, later on this term, similar) queries of the database?
> 
> I appreciate any help and guidance you can give us.
> 
> Sincerely,
> Martin Tompa
> Department of Computer Science & Engineering
> Department of Genome Sciences
> University of Washington.
> 
> 
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to