To The UCSC Genome Bioinformatics Group: My name is Nicholas Lee, and I am a Harvard undergraduate student interning on a team in Dr. Chaochun Wei's lab (lab group cc'ed on this e-mail) this summer at Shanghai Jiao Tong University. Our research results are in the form of two hundred regions--in other words, the entire region between a starting position and an ending position on a given chromosome--for each autosome and sex chromosome. We would now like to compile a database of annotations and other information about and contained within those regions. We have two questions:
1. For finding data on our result regions, your MySQL server capability is optimal. Because we have two hundred regions per chromosome for each chromosome, our number of queries will be very large; in addition, we will likely write a script that takes input starting and ending positions from our results file and accesses your MySQL server. How do we obtain permission for our computers to perform these operations in these volumes? 2. We would like to find information on the following tracks in a way similar to the attached Genome Browser graphic: - Mapping and Sequencing Tracks: Base Position - Phenotype and Disease Associations: OMIM Genes - Genes and Gene Prediction Tracks: UCSC Genes, GENCODE Gene Annotation, MCG Genes, sno/miRNA - mRNA and EST Tracks: Human mRNAs - Regulation: all tracks - Comparative Genomics: Conservation, Primate Chain/Net, Vertebrate Chain/Net - Variation and Repeats: All SNPs, Simple Repeats, RepeatMasker We envision the resultant database as a structured numerical representation of the data available in the Genome Browser. Similar to the attached Genome Browser graphic, we would like to make a database that, for each sequence, has: a list of all known genes (from GENCODE Gene Annotation), known annotated chromosome positions within the region (from GENCODE Gene Annotation), a list of all known mRNAs coded (from Human mRNAs), known mRNAs chromosome positions within the region (from Human mRNAs), percent identity within the region with the chimpanzee genome (from Primate Chain/Net), positions of base differences with the chimpanzee genome (from Primate Chain/Net), et cetera. However, each track within the Genome Browser has many associated tables within the MySQL database and each contains different information; we have found the description page at < http://genome.ucsc.edu/goldenPath/gbdDescriptionsOld.html>. Is there an updated webpage or data file that details the contents of each table in a similar manner to the outdated description page? In addition, which particular tables would you suggest we access in order to get the kind of results we're looking for? Thank you very much in advance for your help! Sincerely, Nicholas Lee
_______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
