Re: [sidr] Scaling properties of caching in a globally deployed RPKI / BGPSEC system

Christopher Morrow Fri, 07 Dec 2012 13:54:40 -0800

On Fri, Dec 7, 2012 at 12:35 PM, Montgomery, Douglas <do...@nist.gov> wrote:


> suggesting/discussing loading a RIB from DNS queries.   I was thought we
> were discussing information systems that might allow me to validate the
> origin of an router's RIB.   That problem is O(500K) at time zero.

backing up a bit in the thread, and I hope/think setting some things
up a bit better for the conversation... or attempting to :)

If we look at the whole system (or a bunch of it) in SIDR/BGPSEC/RPKI,
there are likely these moving parts:

  o RPKI repositories (some number, let's say 1/ASN for simple numbers)
  o RPKI TAL/TA/'Root' bits (say 5 today, hopefully 1 tomorrow which then
    lets you walk down the tree to find all the actors)
  o network operators running networks (again 1/ASN)
  o gathering hosts/systems at each of the above would talk to all
repositories and
     gather the content for local use/distribution (again 1/ASN at
least, probably safe
     to assume 2/ASN at least)
  o cache systems inside each ASN, more than 1, less than 1/router seems sane?

In the end, the last item is completely up to the AS operator in
question. They may choose to run 1 cache/router, or one for their ASN,
they are responsible (according to the docs) to keep their cache's
semi-coherent, or as close to coherent as they can.

So, looking at timing information there's a base time for: "Make a
ROA/EE/etc change to the local repository", that timing is almost up
to the local operator, then things beyond that are about automatic...
first gatherers get data, then local-caches will get sync'd and
distribute to the routers the updates required. It's important to note
that a smart solution would only pass updates to the caches, or rather
the caches would update from some point-in-time that the gatherers
kept. (again, this is likely dependent upon the local operator's
timing requirements/design).

One point that's brought up a bunch on the thread so far is 'cold
start'. There are many forms of this:
  o for a router
  o for a cache
  o for a gatherer
  o for an ASN
I think for some the answer is 'easy', and the timings are 'fast'. For
others though the timing is longer.

For instance, a router cold-start (presuming no special knob request) is:
  1) boot
  2) load-os/config
  3) prefer internal/igp
  4) load cache-data
  5) bring up bgp (e and i) sessions
(it's probably harder to determine if 3 and/or 5 happen before 4
today, but that seems like a vendor tweak to me)
Loading the cache data should be essentially lan-speed limited... or
at least limited by the cache deployment that the operator picks.

A gatherer cold-start is: "Fetch all objects from all remote
repositories" and is likely bounded by times calculated in eric's
paper... or similar. 1/asn links with X average time to
connect/download/digest...

An ASN cold-start is ... actually pretty simple, except that they rely
upon everyone else finding them, so they are likely bounded on start
by the time it takes all remote-asns to walk the system and find them
(call it 4hrs based on the timings in the ops-docs?).

I think the discussion so far has centered around 'all' of the system,
but has variously talked about only 1-2 parts (really) when it comes
to timings. Could we think about the problem-space and timings in the
above framework? or alter the above to something we can all agree
upon?

-Chris
_______________________________________________
sidr mailing list
sidr@ietf.org
https://www.ietf.org/mailman/listinfo/sidr

Re: [sidr] Scaling properties of caching in a globally deployed RPKI / BGPSEC system

Reply via email to