Hi Bibek, I must correct part of my answer below:
One of our engineers let me know that talking about the 'mean' of phastCons or phyloP is misleading. He says, "These are not numbers that can be combined like that. There is a log odds score in the 'Most Conserved' regions that is an accurate summary of these numbers." There are some discussions about this mathematical problem by Adam Siepel in the mail list. This discussion indicates the complication of trying to use these numbers in normal statistical calculations: https://lists.soe.ucsc.edu/pipermail/genome/2005-October/008744.html and follow up information: https://lists.soe.ucsc.edu/pipermail/genome/2005-October/008748.html For more information about the 'Most Conserved" track referred to by our engineer, see the track description here: http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=mm9&g=mostConserved30way. Thank you, Katrina Learned UCSC Genome Bioinformatics Group Katrina Learned wrote, On 11/05/10 10:04: > Hi Bibek, > > For information about step (and PhastCons files in general), see > http://genome.ucsc.edu/goldenPath/help/phastCons.html. Also, on that > page is this information about gaps, "A new declaration line is inserted > in the file when the /chrom/ value changes, when a gap is encountered > (requiring a new /start/ value), or when the /step/ interval changes." > There are a many reasons why there are gaps, one being that where there > is no conservation, there is no score (this is the nature of the HMM > models that are making these numbers). > > For calculating the conservation score for a small region, you can use > the mean. See this previously answered question for more information: > https://lists.soe.ucsc.edu/pipermail/genome/2009-July/019616.html > > For your last question, the easiest way to do this is using our Table > Browser tool. Click on "Tables" on the blue bar on the top of the main > page. In the Table Browser, select mm9 as well as the track and table > you are using (Conservation; phastCons30way), and then create an > intersection with the genes track of your choice (click "create" next to > "intersection:"). For more information about using the Table Browser see > "Using the Table Browser" by scrolling down past the Table Browser form > or the "User's Guide" at > http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html. For specific > information about creating intersections, see > http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html#Intersection. > > Please don't hesitate to contact the mail list again if you have any > further questions. > > Katrina Learned > UCSC Genome Bioinformatics Group > > Bibek wrote, On 11/05/10 07:01: > >> Hi, >> I downloaded phastcons score of mouse (mm9) through FTP >> (hgdownload.cse.ucsc.edu) from >> /goldenPath/mm9/phastCons30way/vertebrate. I checked file, chr1.data & >> found that first block starts at 3000306, whereas next block starts at >> 3002512. See below- >> fixedStep chrom=chr1 start=3000306 step=1 >> fixedStep chrom=chr1 start=3002512 step=1 >> >> I believe each line on this file represents each position of chromosome, >> being step=1. Please clarify if i am wrong. >> >> The difference between beginning of above 2 blocks is 2207, but i found >> only 2175 lines for phastcons score. This indicate that phastcons scores >> of few bases (about 32) are not given in the file. Does it mean 32 >> nucleotides before start of next block (i.e. 3002512) do not show >> alignment & thus no score. Please clarify. >> >> Another query, If i wish to calculate conservation score of a stretch of >> 7-8 nucleotides, what should i adopt. >> >> Lastly, if i wish to convert chromosomal coordinates of these files into >> positions within the gene, which method or track file of mm9 should i use. >> >> Thank you in advance. >> >> regards, >> Bibek >> >> >> >> >> IMPORTANT NOTICE: This e-mail and any attachments may contain >> confidential or sensitive information which is, or may be, legally >> privileged or otherwise protected by law from further disclosure. It >> is intended only for the addressee. If you received this in error or >> from someone who was not authorized to send it to you, please do not >> distribute, copy or use it or any attachments. Please notify the >> sender immediately by reply e-mail and delete this from your >> system. Thank you for your cooperation. >> >> >> _______________________________________________ >> Genome maillist - [email protected] >> https://lists.soe.ucsc.edu/mailman/listinfo/genome >> >> > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
