Hello, The original publication can address most of your questions.
Full article/publication: http://genome.cshlp.org/content/15/8/1034.full.pdf+html Link to abstract/footnotes: http://genome.cshlp.org/content/15/8/1034 Redirected link to supplemental information from footnote: http://compgen.bscb.cornell.edu/~acs/conservation/ Link to supplemental information posted with publication at Genome Research: http://genome.cshlp.org/content/suppl/2005/07/18/gr.3715005.DC1.html In particular, these sections in the supplemental information at genome.cshlp.or explain why not every base is included and what the base mouse assembly was for the publication: 2.7 Missing Data and Alignment Gaps 2.8 Synteny Filtering Table S1: Summary of genomes and assemblies "M. musculus vertebrate mm5 Mouse Genome Sequencing Consortium, 2002" In summary, the first 3M bases of the mouse chromosome is reserved for the telomere region. The remainder of missing bases from the conservation data (original and current) are due to gaps and filtering, as explained above. The mm5 database can be accessed on the archive server here (with limited functionality): http://genome-archive.cse.ucsc.edu/ That stated, it is recommended to use the most current version of the genome assembly and track unless there is a specific reason to go back. Help for data formats (the data file you reference is in "wiggle" WIG format): http://genome.ucsc.edu/FAQ/FAQformat.html We hope this helps to explain the data, Jennifer --------------------------------- Jennifer Jackson UCSC Genome Bioinformatics Group http://genome.ucsc.edu/ On 3/1/10 3:24 AM, Arnoldo Jose Muller Molina wrote: > Hello, > > I am very interested in using the conservation data you published in the > paper: > > Siepel, A., Bejerano, G., Pedersen, J.S., Hinrichs, A., Hou, M., > Rosenbloom, K., Clawson, H., Spieth, J., Hillier, L.W., Richards, S., > Weinstock, G.M., Wilson, R. K., Gibbs, R.A., Kent, W.J., Miller, W., > and Haussler, D. Evolutionarily conserved elements in vertebrate, > insect, worm, and yeast genomes. Genome Res. 15, 1034-1050 (2005). > > The conservation data that I downloaded is from: > > http://hgdownload.cse.ucsc.edu/goldenPath/mm9/phastCons30way/euarchontoglires/ > > Since in the paper you mentioned that the sequence data was downloaded > from the UCSC repository, I downloaded the mm9 genome from: > > http://hgdownload.cse.ucsc.edu/goldenPath/mm9/chromosomes/ > > When checking the data for chromosome 12 (mm9), I realized that its size > (number of bases) is 121257550. The number of conservation predictions > for chromosome 12 that I downloaded has only 88940836 bases. > > The header of the conservation file has something like: > > "fixedStep chrom=chr12 start=3000534 step=1" > > So even if I assume that the predictions start at position 3000534, the > number of bases of the prediction and the ucsc chromosome 12 files do > not match. > > Am I missing something? What would be the way of matching conservation > predictions to the genome? or where can I download a genome sequence > that matches the conservation dataset? > > Best regards, > > Arnoldo Jose Muller Molina > > Max-Planck-Institute for Molecular Biomedicine > Computational Biology and Bioinformatics Group > Röntgenstrasse 20 > 48149 Münster > NRW, Germany > > > > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
