Hello Michael,
To answer your questions: 1. Unfortunately there isn't currently an industry standard for representing missing data in quality scores. Some assemblers use 98, some use 99, and so on. The best I could offer you is to check with the individual assemblers of the assemblies you are interested in, to make sure. 2. I've passed this suggestion on to our developers. I will let you know if there is interest in this change. 3. If you look on the details page for panTro2 quality score, you'll find the following text: "This track includes quality scores for all chromosomes assembled from whole genome shotgun contigs. Quality scores for chromosomes generated from finished clones (chr21, chrY, and chrY_random) were not available at the time this track was constructed; all scores for these chromosomes were manually set at 98. Chromosome 7 was constructed from both contigs and clones; the quality track for this chromosome contains the contig quality scores, with the 1MB finished region indicated by the manually assigned score of 98. All other scores in this track range between 0 and 97 inclusive. " I hope this information is helpful to you. Please don't hesitate to contact us again if you require further assistance. Kayla Smith UCSC Genome Bioinformatics Group ----- "Michael Hiller" <[email protected]> wrote: > Dear UCSC team, > > I have a question concerning the "manually set" quality score 98 that > represents missing quality scores. > The chimp browser for chr21 or chrY does not show quality scores, which > is fine, since there are no qual scores. > However, the hg18 44-way alignment contains for chimp chr21 or chrY the > qual score 0, which comes from mafAddQRows that encodes scores 0 .. <5 > and 98 as 0. > > s panTro2.chr21 13793045 70 + 46489110 > CTTGTGTGCCACCATCCCTGACTTTGTTGATAAGGGCATCAGGCTACATCCCTCTGGTACTCAGTGGTAA > q panTro2.chr21 > 0000000000000000000000000000000000000000000000000000000000000000000000 > > That means that any attempt to filter out bad quality from a maf will > fail for chr21 and Y because one cannot distinguish between a real > quality score of say 3 and missing data (98) because both end up as 0. > > I have the following questions/suggestions: > 1. Is there any species where 98 represents a real quality score (I mean > 97 < 98 < 99) or is 98 always missing data? > 2. Would it make sense to encode score 98 in the maf as '.' like it is > done for gaps? Then one can distinguish between bad qual and missing data. > 3. For chimp: chrY, chrY_random and chr21 have no quality scores in the > browser display and in the quality wib table. However, the region > chr7:87674857-92389096 has quality score 98. And these regions in the > hg18 44-way maf are contain a 0 in the q lines. Is the region > chr7:87674857-92389096 different from chrY or chr21? And why is it > treated differently? > > I think the quality score annotation of mafs is very useful, especially > because of many low coverage genomes. > > Thanks a lot for your help. > - Michael > > > _______________________________________________ > Genome maillist - [email protected] > http://www.soe.ucsc.edu/mailman/listinfo/genome > _______________________________________________ Genome maillist - [email protected] http://www.soe.ucsc.edu/mailman/listinfo/genome
