While I will admit to some astonishment that the following explanation could possibly be news to long-time participants in this WG (given how much time I've spent whining about this issue over the last five years or so both in public and in private), let me quote from the slides:
* How efficient [fetching RPKI repositories using rsync] is depends heavily on how the publication repositories are organized. * In an efficiently organized repository, filesystem hierarchy follows X.509 certificate hierarchy, so that one can pick up significant subtrees with a single rsync connection. * To date, the RIRs have chosen to deploy flat hierarchies where there is no relationship at all between filesystem hierarchy within the repository and certificate hierarchy. To make that more concrete, here's an example. Let's assume we have the following trivial hierarchy: Bob and Betty are issued by Alice, Carol and Carl are issued by Bob, Dave, and Dana are issued by Carol, Dara is issued by Carl, and and all of these are hosted in a single repository. In an inefficient, "flat" repository, the publication points for objects issued by these entities would look something like this: rsync://example.org/rpki/Alice/ rsync://example.org/rpki/Betty/ rsync://example.org/rpki/Bob/ rsync://example.org/rpki/Carl/ rsync://example.org/rpki/Carol/ rsync://example.org/rpki/Dana/ rsync://example.org/rpki/Dara/ rsync://example.org/rpki/Dave/ In a hierarchical repository, the same publication points would look more like this: rsync://example.org/rpki/Alice/ rsync://example.org/rpki/Alice/Betty/ rsync://example.org/rpki/Alice/Bob/ rsync://example.org/rpki/Alice/Bob/Carl/ rsync://example.org/rpki/Alice/Bob/Carl/Dara/ rsync://example.org/rpki/Alice/Bob/Carol/ rsync://example.org/rpki/Alice/Bob/Carol/Dana/ rsync://example.org/rpki/Alice/Bob/Carol/Dave/ Assuming top-down tree walk (the normal case), retrieving objects issued by this set of entities takes eight rsync connections with the flat repository, as opposed to one rsync connection with the hierarchical repository. In practice one might want a slightly more complex structure to limit the size of individual directories, but it doesn't matter so long as the filesystem hierarchy is organized in such a way that picking up an issuer's publication point picks up a non-trivial number of its subjects' publication points automatically. It doesn't have to be perfect, just has to do enough better than the flat model to amortize the cost of setting up and tearing down the rsync connection over a significantly larger number of files. This is not about PKI, it's purely an rsync efficiency issue. Presumably there are scaling limitations to the hierarchical approach, but anecdotal evidence among the people I've asked ("I tried ... and it worked") suggests that, if the underlying networks and filesystems are in good shape, a single rsync connection ought to be able to handle up to at least 10,000 small files, perhaps a lot more than that. Note that this is just talking about rsync itself: mileage might vary significantly if the underlying networks or filesystems are seriously broken. Also note that these anecdotal estimates have not been tested in any rigorous fashion as far as I know, so that's another entry on my list of things we ought to be measuring. Hope this helps to clarify the change I've been suggesting. _______________________________________________ sidr mailing list sidr@ietf.org https://www.ietf.org/mailman/listinfo/sidr