oops.  accidently hit the Send button before I was done.  I made
some edits in the body of the included message below.

note:  snp32 is not on the public server yet, so ignore that
part of my message.

We are glad you find our resources useful and thanks for passing
them on to another generation of students.

Please let us know via the mailing list if you any further questions.

best wishes,

                        --b0b kuhn
                        ucsc genome bioinformatics group

On 4/5/2011 1:43 PM, robert kuhn wrote:
> Hi, Martin,
> 
> thanks for asking.  That might add up to an awful lot of queries if
> you are using a human assembly.  there are 1000s of tables in there.
> You might consider parsing the trackDb table first, because the entries
> there will give you in the "type" field the info you need to figure out
> which tables have those three fields.  see the descriptions of the type
> fields in the documentation for custom tracks.  In short, any of the BED
> tracks or bedGraphs will have chrom, chromStart and chomEnd, though
> there is at least one table grandfathered in with "genoName, genoStart,
> genoEnd" (the rmsk table).  And some others use "tName, tStart, tEnd"
> (for target).
> 
> I believe you should consider putting a delay in your program so that it
> only queries the db every 15 seconds to give others a chance to get
> some resources.  And maybe you should have only one member of the class
> do the main queries to the db, then have the students all share the
> results locally.
> 
> You might also consider limiting the number of records you are
> extracting from the tables to 100 or 1000 or some such.  Several tables
> are millions of rows large, including
 >
> mysql> SELECT COUNT(*) FROM snp131;
> +----------+
> | COUNT(*) |
> +----------+
> | 26033053 |
> +----------+
> 
> mysql> SELECT COUNT(*) FROM snp132;
> +----------+
> | COUNT(*) |
> +----------+
> | 33026121 |
> +----------+
> 
> 
> 
> 
> 
> 
> On 4/5/2011 12:44 PM, Martin Tompa wrote:
>> I am teaching an undergraduate, project-oriented Computational Biology 
>> Capstone course at the University of Washington this term.  The topic 
>> of this term's  project has to do with an analysis of the UCSC human 
>> genome browser data, and could entail "excessive" queries to 
>> genome-mysql.cse.ucsc.edu
>> (discussed at http://genome.ucsc.edu/FAQ/FAQdownloads#download29).
>>
>> Here is the sort of thing we are considering doing initially: 
>> executing a program that would find every hg19 table that contains the 
>> fields chrom, chromStart, and chromEnd, and extracting those 3 columns 
>> from such tables.
>>
>> Do you view this as excessive?  If so, do you have advice for how we 
>> should proceed with this (and, later on this term, similar) queries of 
>> the database?
>>
>> I appreciate any help and guidance you can give us.
>>
>> Sincerely,
>> Martin Tompa
>> Department of Computer Science & Engineering
>> Department of Genome Sciences
>> University of Washington.
>>
>>
>> _______________________________________________
>> Genome maillist  -  [email protected]
>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to