-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello,
I am the "chf" who requested the program on IRC in the first place, and noticing the serious doubts expressed here whether that package is really needed in Debian, I want to point out why I need it nonetheless, and, more important here, why I think it would enrich Debian as well. I am not using the hardlink utility in spite of "rsync --link-dest" already existing, I need it *because* I use this rsync feature for backups. More precisely, I am using "rsnapshot" which essentially allows holding several generations of a backup tree, and conserving huge amounts of space by connectig unchanged files with "rsync --link-dest" to the previous generation. However, I experienced a bug in rsnapshot causing a "gap" whenever one host was down if a backup was scheduled in the meantime. Afterwards the whole trees were unconnected again. The issue seems not to be easily reproducible, so it's unresolved despite of my report on the rsnapshot mailing list. On my search for a remedy I came across several people who had similar problems repairing "broken apart" rsnapshot mirrors, and they evatually directed me towards "hardlink.py" which simply looks for equal files in the set of given directories, linking any equal files together. This saves even more space as "rsnapshot" on its own, because *all* equal files are linked, not only those with the same path and filename from one backup generation to the immediate successor. At a first glance, it seems that this is all the functionality needed, so you could use the perforate package (where the feature is hidden enough for me having not been able to find it) or enhance "fdupe" as requested in #284274. Because I've got many subtrees with a great total number of files, letting one of the existing utilities run over the whole fileset will last several weeks because the machine begins swapping. So I need to split this into several runs over a subset of the trees. These are to be handled independently in a way that none of the runs "knows" anything of ht other ones. You can only achieve this by "maximizing" the link count, thus replacing the file which is linked to less other files with a hard link to the one which already has got a higher link count. I found a reference to such an enhancement to "hardlink.py" in a mailing list and requested the patch. Several weeks later, I received it from it's author via email. This tool was the first one which really did what I needed. After this odyssey, I decided that it would be a wasted effort to use this script only on my own system, because I had read from others with similar problems. I use Debian since a few years, and it has become my favourite Distribution, so I considered creating a new package or finding somebody more experienced who would do this. So I started asking around at "#debian-devel.de" on OFTC, and Julian Andres Klode offered to rewrite this program and make it available in Debian. I'm not sure if I could explain the "maximize link count" feature undestandably, so here is one more example: Imagine you create several identical backup trees with only small changes between them with "rsync --link-dest". Afterwards you move a large (measured in file size) subtree to naother location in the source. The next rsync run will copy this subtree again, but not link it to the unchanged files of the previous run, because their relative location has changed and it cannot identify them as being the same. Now you run the "hardlink" utility over the new copy and *one* of the older trees (its immediate predecessor) due to memory constraints. If you have got a "normal" hardlink utility, there is no way to make sure that indeed every equal file of the new set is replaced by a link to the older one. If you are unlucky, it will just "break off" the predecessor tree from all the oder trees and connect it to the new one resulting in no space gain from this operation. Additionally, the hardlink utility gives more control about what files are considered "equal" than any other similar program I've seen so far. In addition to it's contents you can have it also match owner/group, name, timestamp, mode. The only feature I have not yet found a use for is minimizing the link count. Regards, Christoph, who hopes to have things made a bit more clear rather than having annoyed you with this extremely long text. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iQIcBAEBAgAGBQJJVsFuAAoJEP0O2oBXmANfFR0QAJFcLNrNfPXofyzet+QooFPa W5WuDZOoHIwKa9v5V6JkAaJbAJdrzJUjvvQPRuGlD1cxDN0cX35LptGQXOmT75p1 Q6EHk9soPfW4oCTllmjBBia8+khjm1NMDoW9IEq68TWKLdj0Byq/Wh1GY/8DmRBR kWqSYievqiNZRtf57SO57izgsecNbcAyNVcsO1hsjH8qCFQJgOgG3VpgQPNw40N7 pRXkjG/SsdocKOoGfhhHAlkTx7osVCXHJMBqp6bcaCE4Juav/IK+vSm2gqRhuLxK uRRDFYbf447L4bQcgL4NQCWzQ+/j9+XCFPanjGdjnqMbbMFBDhZdkiX3PzTEZMK2 W5hX2LHtSBA4vLIC58hOQxAjWWZvjZg3/TQufNUTPriySJGrsbglLegY8xEKH0cn RYlA2d9bs2+tdBbnSajK+O+TzTXk7WsCT7xApZAm4UrfbO5Otg4ykOry5L8uXCim FLsjBwdowoJRIyRDrgca5PBazZs1TXR40JQUnH9qO7WkNBDJFVRnAq/yoL7KZdbI wABDcxuVF8mpUG/jBrc8A6M8hRdFaTEyWtMMWDzRRbxDhynnKxujM2SQlt8j0ig6 Hrt/tPg3AAM3iYQ2EcONeDzIekzZW45wNlhR0HJ2qBzYVAwdrl/9ZUxF95cvAoov ZeMf26GYtrZ+oftDN2/D =sUFO -----END PGP SIGNATURE-----