also sprach martin f krafft <madd...@madduck.net> [2010.08.28.1854 +0200]: > Using lsof, I found that the BackupPC_dump process actually has the > corresponding pool file up for reading, so it has identified it. > > This makes me wonder even more why the client still transfers the > whole file. Shouldn't BackupPC_dump terminate the transfer and > procede to the next file instead?
I can confirm a few things, after using strace and lsof on both sides of the transfer. This is about the client sending a file that is already in the pool: a. The BackupPC_dump process does not write to disk if the file is already in the pool. b. The BackupPC_dump process quickly identifies the corresponding file in ./cpool/ after the client started sending the file. c. The client still sends the entire file, and the BackupPC_dump process reads it all, no idea where it puts it. Something is going wrong. I think that one of two things should happen instead: 1. If the dump process has access to the following information: (a) checksum of the 1st and last/8th 128k block of the file, (b) the size of the client's file, and it considered those data reliable enough to identify an existing file, it should terminate the transfer and move on. 2. Assuming that the two 128k block checksums and the file size are not collision-free (they probably aren't), backuppc should really uncompress the pool file and employ rsync's rolling checksum to update the file (in memory). If there were any changes, then it should write out the NewFile to disk; in the absence of changes, it should create the hardlink. After writing this, it seems to me that (2.) is what's currently happening. Can anyone confirm this? Are size + 2×128k checksums not enough to identify a pool file? Can rsyncp somehow ask the remote rsync process for the checksum of the complete file? It could do that after it identified a matching pool file as a preemptive check whether it would be safe to skip the rest of the transfer. Cheers, -- martin | http://madduck.net/ | http://two.sentenc.es/ "a kiss may ruin a human life." -- oscar wilde spamtraps: madduck.bo...@madduck.net
digital_signature_gpg.asc
Description: Digital signature (see http://martin-krafft.net/gpg/)
------------------------------------------------------------------------------ Sell apps to millions through the Intel(R) Atom(Tm) Developer Program Be part of this innovative community and reach millions of netbook users worldwide. Take advantage of special opportunities to increase revenue and speed time-to-market. Join now, and jumpstart your future. http://p.sf.net/sfu/intel-atom-d2d
_______________________________________________ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List: https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki: http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/