Ah - now I see the problem,  here is the outage report with corrected dates..

Summary

On Nov 20 at 2:30PM EST (UTC-5), ARIN updated the software that generates the 
RPKI repository.   On Nov 21 at 9:48PM EST (UTC-5), we were notified by a 3rd 
party that validators no longer were fetching ROAs from organizations that had 
selected the delegated option.  Upon review, ARIN Engineering discovered that a 
certificate was not included in the manifest for each delegated organization. 
The fix was to include that certificate in the manifest for each delegated 
organization was deployed at 1:20AM EST (UTC-5) on Nov 22.  At that time, ROAs 
from the affected delegated repositories could then again be fetched and 
validated.

ARIN's hosted RPKI customers were not affected by this outage in any way. 

Root Cause

The root cause of this failure was a software bug that was introduced by the 
RPKI repository generator. 

Scope of Issue

This bug meant that validators would not fetch information from the delegated 
repositories during the affected period.  ARIN has nine delegated organizations 
and affected approximately 180 ROAs that may have disappeared from the global 
RPKI system for approximately 35 hours and 40 minutes starting on Nov 20 at 
2:30PM EST (UTC-5). Depending on how validation is setup by the ISPs who use 
RPKI, the route origins associated with these 180 ROA’s may have remained in 
the secure state or became unsecure during this period.

After Action Items

ARIN will add additional delegated repository tests to prevent this type of 
operational issue to happen again. Additionally, as planned, ARIN will be 
adding additional improvements to its external monitoring that uses various 
validators to ensure that the repository is working as intended.

Regards,
Mark



On 11/24/20, 8:57 AM, "Mark Kosters" <[email protected]> wrote:

    
    
    On 11/24/20, 6:30 AM, "Job Snijders" <[email protected]> wrote:
    
        Dear Mark,
        
        On Mon, Nov 23, 2020 at 09:32:53PM +0000, Mark Kosters wrote:
        > On Nov 19 at 2:30PM EST (UTC-5), ARIN updated the software that 
generates the RPKI repository.
        > On Nov 20 at 9:48PM EST (UTC-5), we were notified by a 3rd party that 
validators no longer were fetching ROAs from organizations that had selected 
the delegated option.
        
        Can you elaborate on why it appears there was a delay between the
        software update having taken place, and the problem becoming visible?
        
        From my measurements the problem became visible at 19:22 UTC on November
        20nd. The RPKI stack from an end-to-end perspective is an interesting
        waterfall of timers, the above question is for my own edification on how
        this all works.
    
    If I got my timing right, looks like you must have received the updated 
repository as we were pushing out the software updates.
        
        > Upon review, ARIN Engineering discovered that a certificate was not 
included in the manifest for each delegated organization.
        > The fix was to include that certificate in the manifest for each 
delegated organization was deployed at 1:20AM EST (UTC-5) on Nov 21.
        
        A fix was deployed on November ***22nd***, right?
    
    Good catch.
        
    Thanks,
    Mark 
    
    

_______________________________________________
arin-tech-discuss mailing list
[email protected]
https://lists.arin.net/mailman/listinfo/arin-tech-discuss

Reply via email to