Folks -
Perhaps it would be helpful to confirm that we have common goals in the network
operator community regarding RPKI, and then work from those goals on the
necessary plans to achieve them.
It appears that many network operators would like to improve the integrity of
their network routing via RPKI deployment. The Regional Internet Registries
(RIRs) have all worked to support RPKI services, and while there are different
opinions among operators regarding the cost/benefit tradeoffs of RPKI Route
Origin Validation (ROV), it is clear that we have to collectively work together
now if we are ever to have overall RPKI deployment sufficient to create the
network effects that will ensure compelling long-term value for its deployment.
Let’s presume that we’ve achieved that very outcome at some point in future;
i.e. we’re have an Internet where nearly all network operators are publishing
Route Origin Authorizations (ROAs) via RIR RPKI services and are using RPKI
data for route validation. It is reasonable to presume that over the next
decade the Internet will become even more pervasive in everyday life, including
being essential for many connected devices to function, and relied upon for
everything from daily personal communication and conducting business to even
more innovative uses such as payment & sale systems, delivery of medical care,
etc.
Recognizing that purpose of RPKI is improve integrity of routing, and not add
undo fragility to the network, it is reasonable to expect that many network
operators will take due care with the introduction of route validation into
their network routing, including best practices such as falling back
successfully in the event of unavailability of an RIR RPKI Certificate
Authority (CA) and resulting cache timeouts. It is also reasonable expect that
RIR RPKI CA services are provisioned with appropriate robustness of systems and
controls that befit the highly network-critical nature of these services.
Presuming we all share this common goal, the question that arises is whether we
have a common vision regarding what should happen when something goes wrong in
this wonderful RPKI-rich Internet of the future… More than anyone, network
operators realize that even with excellent systems, procedures, and redundancy,
outages can (and do) still occur. Hopefully, these are quite rare, and limited
to occasions where Murphy’s Law has somehow resulted in nearly unimaginable
patterns of coincident failures, but it would irresponsible to not consider the
“what if” scenarios for RPKI failure and whether there is shared vision of the
resulting consequences.
In particular, it would be good to consider the case of an RIR RPKI CA system
failure, one sufficient to result in widespread cache expirations for relying
parties. Ideally, we will never have to see this scenario when RPKI is widely
deployed, but it also not completely inconceivable that an RIR RPKI CA
experience such an outage [1]. For network operators following reasonable
deployment practices, an RIR RPKI CA outage should result in a fallback to
unvalidated network routing data and no significant network impacts. However,
it’s likely not a reasonable assumption that all network operators will have
properly designed and implemented best practices in this regard, so there will
very likely be some networks that experience significant impacts consequential
to any RIR RPKI CA outage. Even if this is only 1 or 2 percent of network
operators with such configuration issues, it will mean hundreds of ISP outages
occurring simultaneously throughout the Internet and millions of customers
(individuals and businesses) effected globally. While the Internet is the
world’s largest cooperative endeavor, there inevitably will be many folks
impacted of a RIR RPKI outage, including some asking (appropriately) the
question of “who should bear responsibility” for the harm that they suffered.
It is worth understanding what the network community believes is the most
appropriate answer to this question, since a common outlook on this question
can be used to guide implementation details to match. Additionally, a common
understanding on this question will provide real insight into how the network
community intends risk of the system to be distributed among the participants.
There are several possible options worth considering:
A) The most obvious answer for the party that should be held liable for
the impacts that result from an RPKI CA failure would be the respective RIR
that experienced the outage. This seems rather straightforward until one
considers that the RIRs are providing these services specifically noting that
they may not be (despite all precautions) available 100% percent of the time,
and clearly documented expectations that those relying on RPKI CA information
for routing origin validation should be fallback to routing with not validated
state [2]. The impacted parties are those customers of ISPs that improperly
handled the unavailability of RPKI data; thus escalating situation into a
network-affecting outage. Under these circumstances, directing the claims from
customers of all the improperly-configured ISP’s to the RIR completely ignores
the responsibility of these ISPs to prepare for this precise eventuality, as
was done by the fellow network operators.
B) One of the more interesting theories on who should be held liable is
that those who are publishing ROA’s are the appropriate responsible parties in
the event of RPKI CA failure; one can achieve such a position on the logic that
they consciously decided to use RPKA CA services and thus asserted globally
that they would henceforth have validated routes – an RPKI CA failure is a case
of their “vendor" (the RIR) letting them down on the publication. This also has
equity issues, since those publishing ROA information don’t have a clear
contributory role, and the damages accruing to them are coming from customers
from those operators who failed their duty.
C) Another potential answer for the party that should be responsible is
that each of the ISPs that failed to appropriately configure their route
validation and thus experience a network outage should be responsible for their
own customers impacted as a result. In addition to keeping the liability
proportional to the customers served, this encourages each such ISP to consider
appropriate corrective measures.
It is possible to architect the various legalities surrounding RPKI to support
any of the above outcomes, but it first requires a shared understanding of what
the network community believes is the correct outcome. There is likely some
on the nanog mailing list who have a view on this matter, so I pose the
question of "who should be responsible" for consequences of RPKI RIR CA failure
to this list for further discussion.
Thanks!
/John
John Curran
President and CEO
American Registry for Internet Numbers (ARIN)
[1] https://www.ietf.org/mail-archive/web/sidr/current/msg05621.html
[2] https://www.rfc-editor.org/rfc/rfc7115.txt