I think that Andreas's concept of treating these mirrors as a database is good. 
Checkpoint logical log replay is better than a simple rsync for large numbers 
of files.  

The replication problem for databases is well-understood and open-source code 
for it is available from at least Postgresql. 

Grab the current log and any logs you're missing since last update and off you 
go 
Another approach which is a non-starter practically speaking but I will mention 
anyway:
Use zfs. Make one filesystem for each mirrored project (CPAN, freshmeat, etc). 
Daily or at other regular interval make a zfs snapshot. Purge old ones after 
some reasonable time such as 2 days. Mirror sites request a zfs incremental 
stream with the name of their last rec'd snapshot and that of the current. 
While zfs is available for Solaris 10, OpenSolaris and I believe FreeBSD (the 
Mac OSX port halted IIRC) this isn't available enough for major mirrors to use 
Sent from my BlackBerry® smartphone with Nextel Direct Connect

Reply via email to