Dear UCSC team, I have a question concerning the "manually set" quality score 98 that represents missing quality scores. The chimp browser for chr21 or chrY does not show quality scores, which is fine, since there are no qual scores. However, the hg18 44-way alignment contains for chimp chr21 or chrY the qual score 0, which comes from mafAddQRows that encodes scores 0 .. <5 and 98 as 0.
s panTro2.chr21 13793045 70 + 46489110 CTTGTGTGCCACCATCCCTGACTTTGTTGATAAGGGCATCAGGCTACATCCCTCTGGTACTCAGTGGTAA q panTro2.chr21 0000000000000000000000000000000000000000000000000000000000000000000000 That means that any attempt to filter out bad quality from a maf will fail for chr21 and Y because one cannot distinguish between a real quality score of say 3 and missing data (98) because both end up as 0. I have the following questions/suggestions: 1. Is there any species where 98 represents a real quality score (I mean 97 < 98 < 99) or is 98 always missing data? 2. Would it make sense to encode score 98 in the maf as '.' like it is done for gaps? Then one can distinguish between bad qual and missing data. 3. For chimp: chrY, chrY_random and chr21 have no quality scores in the browser display and in the quality wib table. However, the region chr7:87674857-92389096 has quality score 98. And these regions in the hg18 44-way maf are contain a 0 in the q lines. Is the region chr7:87674857-92389096 different from chrY or chr21? And why is it treated differently? I think the quality score annotation of mafs is very useful, especially because of many low coverage genomes. Thanks a lot for your help. - Michael _______________________________________________ Genome maillist - [email protected] http://www.soe.ucsc.edu/mailman/listinfo/genome
