Hello,

The original publication can address most of your questions.

Full article/publication:
http://genome.cshlp.org/content/15/8/1034.full.pdf+html

Link to abstract/footnotes:
http://genome.cshlp.org/content/15/8/1034

Redirected link to supplemental information from footnote:
http://compgen.bscb.cornell.edu/~acs/conservation/

Link to supplemental information posted with publication at Genome Research:
http://genome.cshlp.org/content/suppl/2005/07/18/gr.3715005.DC1.html

In particular, these sections in the supplemental information at 
genome.cshlp.or explain why not every base is included and what the base 
mouse assembly was for the publication:
2.7 Missing Data and Alignment Gaps
2.8 Synteny Filtering
Table S1: Summary of genomes and assemblies "M. musculus vertebrate mm5 
Mouse Genome Sequencing Consortium, 2002"

In summary, the first 3M bases of the mouse chromosome is reserved for 
the telomere region. The remainder of missing bases from the 
conservation data (original and current) are due to gaps and filtering, 
as explained above.

The mm5 database can be accessed on the archive server here (with 
limited functionality):
http://genome-archive.cse.ucsc.edu/

That stated, it is recommended to use the most current version of the 
genome assembly and track unless there is a specific reason to go back.

Help for data formats (the data file you reference is in "wiggle" WIG 
format): http://genome.ucsc.edu/FAQ/FAQformat.html

We hope this helps to explain the data,
Jennifer

---------------------------------
Jennifer Jackson
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu/

On 3/1/10 3:24 AM, Arnoldo Jose Muller Molina wrote:
> Hello,
>
> I am very interested in using the conservation data you published in the
> paper:
>
> Siepel, A., Bejerano, G., Pedersen, J.S., Hinrichs, A., Hou, M.,
> Rosenbloom, K., Clawson, H., Spieth, J., Hillier, L.W., Richards, S.,
> Weinstock, G.M., Wilson, R. K., Gibbs, R.A., Kent, W.J., Miller, W.,
> and Haussler, D.  Evolutionarily conserved elements in vertebrate,
> insect, worm, and yeast genomes. Genome Res. 15, 1034-1050 (2005).
>
> The conservation data that I downloaded is from:
>
> http://hgdownload.cse.ucsc.edu/goldenPath/mm9/phastCons30way/euarchontoglires/
>
> Since in the paper you mentioned that the sequence data was downloaded
> from the UCSC repository, I downloaded the mm9 genome from:
>
> http://hgdownload.cse.ucsc.edu/goldenPath/mm9/chromosomes/
>
> When checking the data for chromosome 12 (mm9), I realized that its size
> (number of bases) is 121257550. The number of conservation predictions
> for chromosome 12 that I downloaded has only 88940836 bases.
>
> The header of the conservation file has something like:
>
> "fixedStep chrom=chr12 start=3000534 step=1"
>
> So even if I assume that the predictions start at position 3000534, the
> number of bases of the prediction and the ucsc chromosome 12 files do
> not match.
>
> Am I missing something? What would be the way of matching conservation
> predictions to the genome? or where can I download a genome sequence
> that matches the conservation dataset?
>
> Best regards,
>
> Arnoldo Jose Muller Molina
>
> Max-Planck-Institute for Molecular Biomedicine
> Computational Biology and Bioinformatics Group
> Röntgenstrasse 20
> 48149 Münster
> NRW, Germany
>
>
>
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to