While I will admit to some astonishment that the following explanation
could possibly be news to long-time participants in this WG (given how
much time I've spent whining about this issue over the last five years
or so both in public and in private), let me quote from the slides:

    * How efficient [fetching RPKI repositories using rsync] is
      depends heavily on how the publication repositories are
      organized.

    * In an efficiently organized repository, filesystem hierarchy
      follows X.509 certificate hierarchy, so that one can pick up
      significant subtrees with a single rsync connection.

    * To date, the RIRs have chosen to deploy flat hierarchies where
      there is no relationship at all between filesystem hierarchy
      within the repository and certificate hierarchy.

To make that more concrete, here's an example.  Let's assume we have
the following trivial hierarchy: Bob and Betty are issued by Alice,
Carol and Carl are issued by Bob, Dave, and Dana are issued by Carol,
Dara is issued by Carl, and and all of these are hosted in a single
repository.

In an inefficient, "flat" repository, the publication points for
objects issued by these entities would look something like this:

    rsync://example.org/rpki/Alice/
    rsync://example.org/rpki/Betty/
    rsync://example.org/rpki/Bob/
    rsync://example.org/rpki/Carl/
    rsync://example.org/rpki/Carol/
    rsync://example.org/rpki/Dana/
    rsync://example.org/rpki/Dara/
    rsync://example.org/rpki/Dave/

In a hierarchical repository, the same publication points would look
more like this:

    rsync://example.org/rpki/Alice/
    rsync://example.org/rpki/Alice/Betty/
    rsync://example.org/rpki/Alice/Bob/
    rsync://example.org/rpki/Alice/Bob/Carl/
    rsync://example.org/rpki/Alice/Bob/Carl/Dara/
    rsync://example.org/rpki/Alice/Bob/Carol/
    rsync://example.org/rpki/Alice/Bob/Carol/Dana/
    rsync://example.org/rpki/Alice/Bob/Carol/Dave/

Assuming top-down tree walk (the normal case), retrieving objects
issued by this set of entities takes eight rsync connections with the
flat repository, as opposed to one rsync connection with the
hierarchical repository.

In practice one might want a slightly more complex structure to limit
the size of individual directories, but it doesn't matter so long as
the filesystem hierarchy is organized in such a way that picking up
an issuer's publication point picks up a non-trivial number of its
subjects' publication points automatically.   It doesn't have to be
perfect, just has to do enough better than the flat model to amortize
the cost of setting up and tearing down the rsync connection over a
significantly larger number of files.

This is not about PKI, it's purely an rsync efficiency issue.

Presumably there are scaling limitations to the hierarchical approach,
but anecdotal evidence among the people I've asked ("I tried ... and
it worked") suggests that, if the underlying networks and filesystems
are in good shape, a single rsync connection ought to be able to
handle up to at least 10,000 small files, perhaps a lot more than
that.  Note that this is just talking about rsync itself: mileage
might vary significantly if the underlying networks or filesystems are
seriously broken.  Also note that these anecdotal estimates have not
been tested in any rigorous fashion as far as I know, so that's
another entry on my list of things we ought to be measuring.

Hope this helps to clarify the change I've been suggesting.
_______________________________________________
sidr mailing list
sidr@ietf.org
https://www.ietf.org/mailman/listinfo/sidr

Reply via email to