Ah - now I see the problem, here is the outage report with corrected dates..
Summary On Nov 20 at 2:30PM EST (UTC-5), ARIN updated the software that generates the RPKI repository. On Nov 21 at 9:48PM EST (UTC-5), we were notified by a 3rd party that validators no longer were fetching ROAs from organizations that had selected the delegated option. Upon review, ARIN Engineering discovered that a certificate was not included in the manifest for each delegated organization. The fix was to include that certificate in the manifest for each delegated organization was deployed at 1:20AM EST (UTC-5) on Nov 22. At that time, ROAs from the affected delegated repositories could then again be fetched and validated. ARIN's hosted RPKI customers were not affected by this outage in any way. Root Cause The root cause of this failure was a software bug that was introduced by the RPKI repository generator. Scope of Issue This bug meant that validators would not fetch information from the delegated repositories during the affected period. ARIN has nine delegated organizations and affected approximately 180 ROAs that may have disappeared from the global RPKI system for approximately 35 hours and 40 minutes starting on Nov 20 at 2:30PM EST (UTC-5). Depending on how validation is setup by the ISPs who use RPKI, the route origins associated with these 180 ROA’s may have remained in the secure state or became unsecure during this period. After Action Items ARIN will add additional delegated repository tests to prevent this type of operational issue to happen again. Additionally, as planned, ARIN will be adding additional improvements to its external monitoring that uses various validators to ensure that the repository is working as intended. Regards, Mark On 11/24/20, 8:57 AM, "Mark Kosters" <[email protected]> wrote: On 11/24/20, 6:30 AM, "Job Snijders" <[email protected]> wrote: Dear Mark, On Mon, Nov 23, 2020 at 09:32:53PM +0000, Mark Kosters wrote: > On Nov 19 at 2:30PM EST (UTC-5), ARIN updated the software that generates the RPKI repository. > On Nov 20 at 9:48PM EST (UTC-5), we were notified by a 3rd party that validators no longer were fetching ROAs from organizations that had selected the delegated option. Can you elaborate on why it appears there was a delay between the software update having taken place, and the problem becoming visible? From my measurements the problem became visible at 19:22 UTC on November 20nd. The RPKI stack from an end-to-end perspective is an interesting waterfall of timers, the above question is for my own edification on how this all works. If I got my timing right, looks like you must have received the updated repository as we were pushing out the software updates. > Upon review, ARIN Engineering discovered that a certificate was not included in the manifest for each delegated organization. > The fix was to include that certificate in the manifest for each delegated organization was deployed at 1:20AM EST (UTC-5) on Nov 21. A fix was deployed on November ***22nd***, right? Good catch. Thanks, Mark _______________________________________________ arin-tech-discuss mailing list [email protected] https://lists.arin.net/mailman/listinfo/arin-tech-discuss
