Hello Jirong
I am glad that you wrote back to clarify this, as one of our developers
brought this score confusion possibility up ealier in the week with
regard to my my initial answer. The data in the hapmapAlleles* tables
you specify represent a data quality metric. For the other hapmapSNP*
tables, this data is a score, but perhaps not the score you are
interested in. If I understand your question correctly, you are
interesting in the score as represented in the Conservation track
(phastCons*Way, mutiple species alignments).
I am unable to view your attachments, but I can provide some help for
obtaining the Conservation score for HapMap SNP locations.
There are a few methods for obtaining this data (Table Browser, Galaxy,
and Flat File). If you are interested in a specific score for a
particular HapMap SNP location, the best choice is to access the flat
files directly.
The files are located our downloads area ("Downloads" in the blue menu
bar http://genome.ucsc.edu/, left side of main browser window). From the
downloads page, select "Human" and the track "Conservation scores for
alignments of 43 vertebrate genomes with Human". The ftp path and a
description of the files are in the README document.
Instructions for ftp: http://genome.ucsc.edu/FAQ/FAQdownloads#download1
File format help: http://genome.ucsc.edu/FAQ/FAQformat
The idea would be to extract the base position for a HapMap SNP from the
.wig formatted files. You would need to develop simple tools to search,
match, and extract the data points.
If you decide to use the Table Browser, it is possible to start with the
Conservation track and phastCons* table of your choice and perform an
intersection against the entire HapMap track or a subset by limiting the
results by genomic region or a custom track containing a subset of the
HapMap track. When doing this intersection, you will be retrieving
complete "blocks" of data from the original base table with any overlap
with any of the data you are used as a filter in the genomic region or
track intersection. Meaning, the data will not correlate1-1. Any blocks
will be returned in their entirity and the original HapMap data point
name will not be annotated in the output. This is why I advise against
this method.
If you decide to you Galaxy, the data from two tables can also be
intersected, with the added advantage that data from both the base and
intersection table will result. Send the tables to Galaxy, format as
necessary, and join the data.
Suggested tables to get conservation score data from in the latest human
are phastCons44way (newest data, one score for a block of data) and
multiz28waySummary (conservation scores per species). Galaxy can convert
the file formats to interval or bed to provide "one row per base
position" which will make comparing the data easier.
I hope this helps,
Jennifer Jackson
UCSC Genome Bioinformatics Group
Long, Jirong wrote:
> Many thanks, Jennifer.
>
> We downloaded hapmapAllelesSummary.txt. We just want to make sure the
> 6th column is for the conservation score, such as 0, 0, 65, 0, 4 in the
> first 5 rows of the hapmapAllelesSummary.txt. Please see attachments.
> Appreciate your kind help.
>
> Warmest regard,
>
> Jirong
>
> -----Original Message-----
> From: Jennifer Jackson [mailto:[email protected]]
> Sent: Tuesday, February 17, 2009 5:40 PM
> To: Long, Jirong
> Cc: [email protected]
> Subject: Re: [Genome] conservate score for HapMap SNPs
>
> Hello,
> You can ftp the files from our downloads server or save the files from
> the Table Browser.
>
> For ftp, go to the main browser web site http://genome.ucsc.edu/ and
> click on "Downloads" in the left blue navigation bar. For the most
> recent data for human (hg18), click on Human, then click into the
> Annotation Database directory. Files named like hapmapSnps*.txt.gz (and
> maybe hapmapAlleles*.txt.gz) are the files related to the HapMap SNPs
> track.
> Instructions for ftp access:
>
> For saving from the Table browser, go to the main browser web site
> http://genome.ucsc.edu/ and click on "Table Browser" in the left blue
> navigation bar. Select the clade, genome, assembly for the latest human.
>
> Set group: Variation and Repeats and track: HapMap SNPs. The associated
> tables will be in the tables pull-down menu. Use the "View table schema"
>
> button to view table contents (maybe be a useful tool anyway, even if
> you use ftp). Make sure that region: genome and output format: all
> fields from selected tables. Name the output file and it will save to
> your computer.
>
> For more info about the track, go into the Human assembly browser and
> click on the track name for a full description of methods, sources, etc.
>
> Thanks!
> Jennifer Jackson
> UCSC Genome Bioinformatics Group
>
>
> Long, Jirong wrote:
>
>> Dear Sir/Madam,
>>
>>
>>
>> We are wondering whether you have a ftp address that we can use to
>> download the conservative score for each of the HapMap SNPs? Thanks.
>>
>>
>>
>> Best,
>>
>>
>>
>> Jirong
>>
>>
>>
>> *******************************************
>>
>> Jirong Long, PhD
>>
>> Assistant Professor
>>
>> Vanderbilt Epidemiology Center
>>
>> Vanderbilt Ingram Cancer Center
>>
>> Eighth floor, Suite 800
>>
>> 2525 West End Avenue
>>
>> Nashville, TN 37203-1738
>>
>> Tel: 615-343-6741
>> Fax: 615-322-0502
>>
>> E-mail: [email protected]
>>
>>
>>
>> _______________________________________________
>> Genome maillist - [email protected]
>> http://www.soe.ucsc.edu/mailman/listinfo/genome
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> -- MySQL dump 10.10
>> --
>> -- Host: localhost Database: hg18
>> -- ------------------------------------------------------
>> -- Server version 5.0.21
>>
>> /*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
>> /*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
>> /*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
>> /*!40101 SET NAMES utf8 */;
>> /*!40103 SET @OLD_TIME_ZONE=@@TIME_ZONE */;
>> /*!40103 SET TIME_ZONE='+00:00' */;
>> /*!40101 SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='' */;
>> /*!40111 SET @OLD_SQL_NOTES=@@SQL_NOTES, SQL_NOTES=0 */;
>>
>> --
>> -- Table structure for table `hapmapAllelesSummary`
>> --
>>
>> DROP TABLE IF EXISTS `hapmapAllelesSummary`;
>> CREATE TABLE `hapmapAllelesSummary` (
>> `bin` int(10) unsigned NOT NULL default '0',
>> `chrom` varchar(255) NOT NULL default '',
>> `chromStart` int(10) unsigned NOT NULL default '0',
>> `chromEnd` int(10) unsigned NOT NULL default '0',
>> `name` varchar(255) NOT NULL default '',
>> `score` int(10) unsigned NOT NULL default '0',
>> `strand` enum('+','-','?') NOT NULL default '?',
>> `observed` varchar(255) NOT NULL default '',
>> `allele1` enum('A','C','G','T') NOT NULL default 'A',
>> `allele2` enum('C','G','T','none') NOT NULL default 'C',
>> `popCount` int(10) unsigned NOT NULL default '0',
>> `isMixed` varchar(255) NOT NULL default '',
>> `majorAlleleCEU` enum('A','C','G','T','none') NOT NULL default 'A',
>> `majorAlleleCountCEU` int(10) unsigned NOT NULL default '0',
>> `totalAlleleCountCEU` int(10) unsigned NOT NULL default '0',
>> `majorAlleleCHB` enum('A','C','G','T','none') NOT NULL default 'A',
>> `majorAlleleCountCHB` int(10) unsigned NOT NULL default '0',
>> `totalAlleleCountCHB` int(10) unsigned NOT NULL default '0',
>> `majorAlleleJPT` enum('A','C','G','T','none') NOT NULL default 'A',
>> `majorAlleleCountJPT` int(10) unsigned NOT NULL default '0',
>> `totalAlleleCountJPT` int(10) unsigned NOT NULL default '0',
>> `majorAlleleYRI` enum('A','C','G','T','none') NOT NULL default 'A',
>> `majorAlleleCountYRI` int(10) unsigned NOT NULL default '0',
>> `totalAlleleCountYRI` int(10) unsigned NOT NULL default '0',
>> `chimpAllele` enum('A','C','G','N','T','none') NOT NULL default 'A',
>> `chimpAlleleQuality` int(10) unsigned NOT NULL default '0',
>> `macaqueAllele` enum('A','C','G','N','T','none') NOT NULL default 'A',
>> `macaqueAlleleQuality` int(10) unsigned NOT NULL default '0',
>> KEY `name` (`name`),
>> KEY `chrom` (`chrom`,`bin`)
>> ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
>>
>> /*!40103 SET time_zo...@old_time_zone */;
>>
>> /*!40101 SET sql_mo...@old_sql_mode */;
>> /*!40101 SET character_set_clie...@old_character_set_client */;
>> /*!40101 SET character_set_resul...@old_character_set_results */;
>> /*!40101 SET collation_connecti...@old_collation_connection */;
>> /*!40111 SET sql_not...@old_sql_notes */;
>>
>> -- Dump completed on 2007-07-11 17:35:45
>>
_______________________________________________
Genome maillist - [email protected]
http://www.soe.ucsc.edu/mailman/listinfo/genome