Hey Karen, This is very much an active area of research, but there are some common ways that assemblies are characterized. You can find these statistics for most of the assemblies on the gateway page for that organism. Here is that page for chicken: http://genome.ucsc.edu/cgi-bin/hgGateway?db=galGal3
On this page you can see the N50 numbers, where N50 is the length such that 50% of the assembled genome lies in blocks of the N50 size or longer. N50 is the most commonly used single number measurement of assembly quality. The bigger the better. Another number that is commonly quoted is the X coverage, where the number before the X is the average read coverage over the whole genome. Again, more is better usually. In general, you should not assume that a particular organism is missing sequence in its actual genome, just because there is nothing aligning to human or mouse from the assembly. The only way you can be fairly sure that sequence is missing in the distant animal's genome is if you have very solid alignments on either side of the putatively deleted sequence, and that these alignments are within one contig/scaffold in both species. I hope this answers your question. If you have follow up questions, please address them to this list. Brian On Wed, May 18, 2011 at 11:19 AM, Karen Lawrence <[email protected]> wrote: > > Is there a quantitative measure for how complete a given genome assembly > is? With increasing distance from human, there are some unaligned or > unannotated regions, for instance, in lizard or chicken. This number would > be helpful to me, as I am doing many cross-species comparisons and > 'LiftOvers' of datasets generated in the mouse but then being Lifted to > other more distant vertebrates. > Thanks for your help. > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
