Dear UCSC team,

I have a question concerning the "manually set" quality score 98 that 
represents missing quality scores.
The chimp browser for chr21 or chrY does not show quality scores, which 
is fine, since there are no qual scores.
However, the hg18 44-way alignment contains for chimp chr21 or chrY the 
qual score 0, which comes from mafAddQRows that encodes scores 0 .. <5 
and 98 as 0.

s panTro2.chr21               13793045 70 +  46489110 
CTTGTGTGCCACCATCCCTGACTTTGTTGATAAGGGCATCAGGCTACATCCCTCTGGTACTCAGTGGTAA
q panTro2.chr21                                       
0000000000000000000000000000000000000000000000000000000000000000000000

That means that any attempt to filter out bad quality from a maf will 
fail for chr21 and Y because one cannot distinguish between a real 
quality score of say 3 and missing data (98) because both end up as 0.

I have the following questions/suggestions:
1. Is there any species where 98 represents a real quality score (I mean 
97 < 98 < 99) or is 98 always missing data?
2. Would it make sense to encode score 98 in the maf as '.' like it is 
done for gaps? Then one can distinguish between bad qual and missing data.
3. For chimp: chrY, chrY_random and chr21 have no quality scores in the 
browser display and in the quality wib table. However, the region 
chr7:87674857-92389096 has quality score 98. And these regions in the 
hg18 44-way maf are contain a 0 in the q lines. Is the region 
chr7:87674857-92389096 different from chrY or chr21? And why is it 
treated differently?

I think the quality score annotation of mafs is very useful, especially 
because of many low coverage genomes.

Thanks a lot for your help.
- Michael


_______________________________________________
Genome maillist  -  [email protected]
http://www.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to