Towards an RPKI-rich Internet (and the appropriate allocation of responsibility in the event an RIR RPKI CA outage)

John Curran Sun, 30 Sep 2018 16:23:19 -0700

Folks -

Perhaps it would be helpful to confirm that we have common goals in the network 
operator community regarding RPKI, and then work from those goals on the 
necessary plans to achieve them.


It appears that many network operators would like to improve the integrity of 
their network routing via RPKI deployment.  The Regional Internet Registries 
(RIRs) have all worked to support RPKI services, and while there are different 
opinions among operators regarding the cost/benefit tradeoffs of RPKI Route 
Origin Validation (ROV), it is clear that we have to collectively work together 
now if we are ever to have overall RPKI deployment sufficient to create the 
network effects that will ensure compelling long-term value for its deployment.

Let’s presume that we’ve achieved that very outcome at some point in future; 
i.e. we’re have an Internet where nearly all network operators are publishing 
Route Origin Authorizations (ROAs) via RIR RPKI services and are using RPKI 
data for route validation.  It is reasonable to presume that over the next 
decade the Internet will become even more pervasive in everyday life, including 
being essential for many connected devices to function, and relied upon for 
everything from daily personal communication and conducting business to even 
more innovative uses such as payment & sale systems, delivery of medical care, 
etc.

Recognizing that purpose of RPKI is improve integrity of routing, and not add 
undo fragility to the network, it is reasonable to expect that many network 
operators will take due care with the introduction of route validation into 
their network routing, including best practices such as falling back 
successfully in the event of unavailability of an RIR RPKI Certificate 
Authority (CA) and resulting cache timeouts.  It is also reasonable expect that 
RIR RPKI CA services are provisioned with appropriate robustness of systems and 
controls that befit the highly network-critical nature of these services.

Presuming we all share this common goal, the question that arises is whether we 
have a common vision regarding what should happen when something goes wrong in 
this wonderful RPKI-rich Internet of the future…   More than anyone, network 
operators realize that even with excellent systems, procedures, and redundancy, 
outages can (and do) still occur.  Hopefully, these are quite rare, and limited 
to occasions where Murphy’s Law has somehow resulted in nearly unimaginable 
patterns of coincident failures, but it would irresponsible to not consider the 
“what if” scenarios for RPKI failure and whether there is shared vision of the 
resulting consequences.

In particular, it would be good to consider the case of an RIR RPKI CA system 
failure, one sufficient to result in widespread cache expirations for relying 
parties.  Ideally, we will never have to see this scenario when RPKI is widely 
deployed, but it also not completely inconceivable that an RIR RPKI CA 
experience such an outage [1]. For network operators following reasonable 
deployment practices, an RIR RPKI CA outage should result in a fallback to 
unvalidated network routing data and no significant network impacts.  However, 
it’s likely not a reasonable assumption that all network operators will have 
properly designed and implemented best practices in this regard, so there will 
very likely be some networks that experience significant impacts consequential 
to any RIR RPKI CA outage.  Even if this is only 1 or 2 percent of network 
operators with such configuration issues, it will mean hundreds of ISP outages 
occurring simultaneously throughout the Internet and millions of customers 
(individuals and businesses) effected globally.  While the Internet is the 
world’s largest cooperative endeavor, there inevitably will be many folks 
impacted of a RIR RPKI outage, including some asking (appropriately) the 
question of “who should bear responsibility” for the harm that they suffered.

It is worth understanding what the network community believes is the most 
appropriate answer to this question, since a common outlook on this question 
can be used to guide implementation details to match.   Additionally, a common 
understanding on this question will provide real insight into how the network 
community intends risk of the system to be distributed among the participants.

There are several possible options worth considering:

     A) The most obvious answer for the party that should be held liable for 
the impacts that result from an RPKI CA failure would be the respective RIR 
that experienced the outage.  This seems rather straightforward until one 
considers that the RIRs are providing these services specifically noting that 
they may not be (despite all precautions) available 100% percent of the time, 
and clearly documented expectations that those relying on RPKI CA information 
for routing origin validation should be fallback to routing with not validated 
state [2].   The impacted parties are those customers of ISPs that improperly 
handled the unavailability of RPKI data; thus escalating situation into a 
network-affecting outage.  Under these circumstances, directing the claims from 
customers of all the improperly-configured ISP’s to the RIR completely ignores 
the responsibility of these ISPs to prepare for this precise eventuality, as 
was done by the fellow network operators.

     B) One of the more interesting theories on who should be held liable is 
that those who are publishing ROA’s are the appropriate responsible parties in 
the event of RPKI CA failure; one can achieve such a position on the logic that 
they consciously decided to use RPKA CA services and thus asserted globally 
that they would henceforth have validated routes – an RPKI CA failure is a case 
of their “vendor" (the RIR) letting them down on the publication. This also has 
equity issues, since those publishing ROA information don’t have a clear 
contributory role, and the damages accruing to them are coming from customers 
from those operators who failed their duty.

     C) Another potential answer for the party that should be responsible is 
that each of the ISPs that failed to appropriately configure their route 
validation and thus experience a network outage should be responsible for their 
own customers impacted as a result.  In addition to keeping the liability 
proportional to the customers served, this encourages each such ISP to consider 
appropriate corrective measures.

It is possible to architect the various legalities surrounding RPKI to support 
any of the above outcomes, but it first requires a shared understanding of what 
the network community believes is the correct outcome.   There is likely some 
on the nanog mailing list who have a view on this matter, so I pose the 
question of "who should be responsible" for consequences of RPKI RIR CA failure 
to this list for further discussion.

Thanks!
/John

John Curran
President and CEO
American Registry for Internet Numbers (ARIN)

[1] https://www.ietf.org/mail-archive/web/sidr/current/msg05621.html
[2] https://www.rfc-editor.org/rfc/rfc7115.txt

Towards an RPKI-rich Internet (and the appropriate allocation of responsibility in the event an RIR RPKI CA outage)

Reply via email to