Steve Christey (CVE Editor) wrote an open letter to several mailing lists
regarding the nature of vulnerability statistics. What he said is spot on,
and most of what I would have pointed out had my previous rant been more
broad, and not a direct attack on a specific group. I am posting his entire
letter here, because it needs to be said, read, understood, and drilled into
the heads of so many people. I am reformatting this for the blog, you can
read an original copy via a mail list.
Open Letter on the Interpretation of ³Vulnerability Statistics²
Author: Steve Christey, CVE Editor
Date: January 4, 2006
As the new year begins, there will be many temptations to generate, comment,
or report on vulnerability statistics based on totals from 2005. The
original reports will likely come from publicly available Refined
Vulnerability Information (RVI) sources - that is, vulnerability databases
(including CVE/NVD), notification services, and periodic summary producers.
RVI sources collect unstructured vulnerability information from Raw Sources.
Then, they refine, correlate, and redistribute the information to others.
Raw sources include mailing lists like Bugtraq, Vulnwatch, and
Full-Disclosure, web sites like PacketStorm and Securiteam, blogs,
conferences, newsgroups, direct emails, etc.
In my opinion, RVI sources are still a year or two away from being able to
produce reliable, repeatable, and COMPARABLE statistics. In general,
consumers should treat current statistics as suggestive, not conclusive.
Vulnerability statistics are difficult to interpret due to several factors:
* - VARIATIONS IN EDITORIAL POLICY. An RVI source¹s editorial policy
dictates HOW MANY vulnerabilities are reported, and WHICH vulnerabilities
are reported. RVIs have widely varying policies. You can¹t even compare an
RVI against itself, unless you can be sure that its editorial policy has not
changed within the relevant data set. The editorial policies of RVIs seem to
take a few years before they stabilize, and there is evidence that they can
* - FRACTURED VULNERABILITY INFORMATION. Each RVI source collects its
information from its own list of raw sources - web sites, mailing lists,
blogs, etc. RVIs can also use other RVIs as sources. Apparently for
competitive reasons, some RVIs might not identify the raw source that was
used for a vulnerability item, which is one aspect of what I refer to as the
provenance problem. Long gone are the days when a couple mailing lists or
newsgroups were the raw source for 90% of widely available vulnerability
information. Based on what I have seen, the provenance problem is only going
to get worse.
* - LACK OF COMPLETE CROSS-REFERENCING BETWEEN RVI SOURCES. No RVI has
an exhaustive set of cross-references, so no RVI can be sure that it is 100%
comprehensive, even with respect to its own editorial policy. Some RVIs
compete with each other directly, so they don¹t cross-reference each other.
Some sources could theoretically support all public cross-references - most
notably OSVDB and CVE - but they do not, due to resource limitations or
* - UNMEASURABLE RESEARCH COMMUNITY BIAS. Vulnerability researchers vary
widely in skill sets, thoroughness, preference for certain vulnerability
types or product classes, and so on. This collectively produces a bias that
is not currently measurable against the number of latent vulnerabilities
that actually exist. Example: web browser vulnerabilities were once thought
to belong to Internet Explorer only, until people actually started
researching other browsers; many elite researchers concentrate on a small
number of operating systems or product classes; basic SQL injection and XSS
are very easy to find manually; etc.
* - UNMEASURABLE DISCLOSURE BIAS. Vendors and researchers vary widely in
their disclosure models, which creates an unmeasurable bias. For example,
one vendor might hire an independent auditor and patch all reported
vulnerabilities without publicly announcing any of them, or a different
vendor might publish advisories even for very low-risk issues. One
researcher might disclose without coordinating with the vendor at all,
whereas another researcher might never disclose an issue until a patch is
provided, even if the vendor takes an inordinate amount of time to respond.
Note that many large-scale comparisons, such as ³Linux vs. Windows,² can not
be verified due to unmeasurable bias, and/or editorial policy of the core
RVI that was used to conduct the comparison.
EDITORIAL POLICY VARIATIONS
This is just a sample of variations in editorial policy. There are
legitimate reasons for each variation, usually due to audience needs or
availability of analytical resources.
COMPLETENESS (what is included):
1. SEVERITY. Some RVIs do not include very low-risk items such as a bug
that causes path disclosure in an error message in certain non-operational
configurations. Secunia and SecurityFocus do not do this, although they
might note this when other issues are identified. Others include low-risk
issues, such as CVE, ISS X-Force, US-CERT Security Bulletins, and OSVDB.
2. VERACITY. Some RVIs will only publish vulnerabilities when they are
confident that the original, raw report is legitimate - or if they¹re
verified it themselves. Others will publish reports when they are first
detected from the raw sources. Still others will only publish reports when
they are included in other RVIs, which makes them subject to the editorial
policies of those RVIs unless care is taken. For example, US-CERT¹s
Vulnerability Notes have a high veracity requirement before they are
published; OSVDB and CVE have a lower requirement for veracity, although
they have correction mechanisms in place if veracity is questioned, and CVE
has a two-stage approach (candidates and entries).
3. PRODUCT SPACE. Some RVIs might omit certain products that have very
limited distribution, are in the beta development stage, or are not
applicable to the intended audience. For example, version 0.0.1 of a
low-distribution package might be omitted, or if the RVI is intended for a
business audience, video game vulnerabilities might be excluded. On the
other hand, some ³beta² products have extremely wide distribution.
4. OTHER VARIATIONS. Other variations exist but have not been studied or
categorized at this time. One example, though, is historical completeness.
Most RVIs do not cover vulnerabilities before the RVI was first launched,
whereas others - such as CVE and OSVDB - can include issues that are older
than the RVI itself. As another example: a few years ago, Neohapsis made an
editorial decision to omit most PHP application vulnerabilities from their
summaries, if they were obscure products, or if the
vulnerability was not exploitable in a typical operational
ABSTRACTION (how vulnerabilities are ³counted²):
5. VULNERABILITY TYPE. Some RVIs distinguish between types of
vulnerabilities (e.g. buffer overflow, format string, symlink, XSS, SQL
injection). CVE, OSVDB, ISS X-Force, and US-CERT Vulnerability Notes perform
this distinction; Secunia, FrSIRT, and US-CERT Cyber Security Bulletins do
not. Bugtraq IDs vary. As vulnerability classification becomes more
detailed, there is more room for variation (e.g. integer overflows and
off-by-ones might be separated from ³classic² overflows).
6. REPLICATION. Some RVIs will produce multiple records for the same core
vulnerability, even based on the RVI¹s own definition. Usually this is done
when the same vulnerability affects multiple vendors, or if important
information is released at a later date. Secunia and US-CERT Security
Bulletins use replication; so might vendor advisories (for each supported
distribution). OSVDB, Bugtraq ID, CVE, US-CERT Vulnerability Notes, and ISS
X-Force do not - or, they use different replication than others.
Replication¹s impact on statistics is not well understood.
7. OTHER VARIATIONS. Other abstraction variations exist but have not been
studied or categorized at this time. As one example, if an SQL injection
vulnerability affects multiple executables in the same product, OSVDB will
create one record for each affected program, whereas CVE will combine them.
8. RVIs differ in how quickly they must release vulnerability
information. While this used to vary significantly in the past, these days
most public RVIs have very short timelines, from the hour of release to
within a few days. Vulnerability information can be volatile in the early
stages, so an RVI¹s requirements for timeliness directly affects its
veracity and completeness.
9. All RVIs deal with limited resources or time, which significantly
affects completeness, especially with respect to veracity, or timeliness
(which is strongly associated with the ability to achieve completeness).
Abstraction might also be affected, although usually to a lesser degree,
except in the case of large-scale disclosures.
In my opinion:
You should not interpret any RVI¹s statistics without considering its
editorial policy. For example, the US-CERT Cyber Security Bulletin Summary
for 2005 uses statistics that include replication. (As a side note, a causal
glance at the bulletin¹s contents makes it clear that it cannot be used to
compare Windows to Linux as operating systems.)
In addition, you should not compare statistics from different RVIs until (a)
the RVIs are clear about their editorial policy and (b) the differences in
editorial policy can be normalized. Example: based on my PRELIMINARY
investigations of a few hours¹ work, OSVDB would have about 50% more records
than CVE, even though it has the same underlying number of vulnerabilities
and the same completeness policy for recent issues.
Third, for the sake of more knowledgeable analysis, RVIs should consider
developing and publishing their own editorial policies.
(Note that based on CVE¹s experience, this can be difficult to do.)
Consumers should be aware that some RVIs might not be open about their raw
sources, veracity analysis, and/or completeness.
Finally: while RVIs are not yet ready to provide usable, conclusive
statistics, there is a solid chance that they will be able to do so in the
near future. Then, the only problem will be whether the statistics are
properly interpreted. But that is beyond the scope of this letter.
P.S. This post was written for the purpose of timely technical exchange.
Members of the press are politely requested to consult me before directly
attributing quotes from this article, especially with respect to stated
You are a subscribed member of the infowarrior list. Visit
www.infowarrior.org for list information or to unsubscribe. This message
may be redistributed freely in its entirety. Any and all copyrights
appearing in list messages are maintained by their respective owners.