Bug#509685: ITP: hardlink -- Hardlink multiple copies of the same file
We already have these packages: fdupes perforate AFAIK, they do not replace files, they just find them. Wrong, fdupe from perforate does link them together if one wants to. Imagine you have two backups, each on a different filesystem. Now you want to have them both on one filesystem. In this situation, you can use hardlink to link all common files in the backups together. Or you just use rsync with --link-dest and don't need a second tool for it at all. -- bye, Joerg That's just f***ing great, now the bar for being a cool guy in free software just got raised. It used to be you just had to write a million lines of useful code. Now you've got to get a subpoena from SCO to be cool. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#509685: ITP: hardlink -- Hardlink multiple copies of the same file
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello, I am the chf who requested the program on IRC in the first place, and noticing the serious doubts expressed here whether that package is really needed in Debian, I want to point out why I need it nonetheless, and, more important here, why I think it would enrich Debian as well. I am not using the hardlink utility in spite of rsync --link-dest already existing, I need it *because* I use this rsync feature for backups. More precisely, I am using rsnapshot which essentially allows holding several generations of a backup tree, and conserving huge amounts of space by connectig unchanged files with rsync --link-dest to the previous generation. However, I experienced a bug in rsnapshot causing a gap whenever one host was down if a backup was scheduled in the meantime. Afterwards the whole trees were unconnected again. The issue seems not to be easily reproducible, so it's unresolved despite of my report on the rsnapshot mailing list. On my search for a remedy I came across several people who had similar problems repairing broken apart rsnapshot mirrors, and they evatually directed me towards hardlink.py which simply looks for equal files in the set of given directories, linking any equal files together. This saves even more space as rsnapshot on its own, because *all* equal files are linked, not only those with the same path and filename from one backup generation to the immediate successor. At a first glance, it seems that this is all the functionality needed, so you could use the perforate package (where the feature is hidden enough for me having not been able to find it) or enhance fdupe as requested in #284274. Because I've got many subtrees with a great total number of files, letting one of the existing utilities run over the whole fileset will last several weeks because the machine begins swapping. So I need to split this into several runs over a subset of the trees. These are to be handled independently in a way that none of the runs knows anything of ht other ones. You can only achieve this by maximizing the link count, thus replacing the file which is linked to less other files with a hard link to the one which already has got a higher link count. I found a reference to such an enhancement to hardlink.py in a mailing list and requested the patch. Several weeks later, I received it from it's author via email. This tool was the first one which really did what I needed. After this odyssey, I decided that it would be a wasted effort to use this script only on my own system, because I had read from others with similar problems. I use Debian since a few years, and it has become my favourite Distribution, so I considered creating a new package or finding somebody more experienced who would do this. So I started asking around at #debian-devel.de on OFTC, and Julian Andres Klode offered to rewrite this program and make it available in Debian. I'm not sure if I could explain the maximize link count feature undestandably, so here is one more example: Imagine you create several identical backup trees with only small changes between them with rsync --link-dest. Afterwards you move a large (measured in file size) subtree to naother location in the source. The next rsync run will copy this subtree again, but not link it to the unchanged files of the previous run, because their relative location has changed and it cannot identify them as being the same. Now you run the hardlink utility over the new copy and *one* of the older trees (its immediate predecessor) due to memory constraints. If you have got a normal hardlink utility, there is no way to make sure that indeed every equal file of the new set is replaced by a link to the older one. If you are unlucky, it will just break off the predecessor tree from all the oder trees and connect it to the new one resulting in no space gain from this operation. Additionally, the hardlink utility gives more control about what files are considered equal than any other similar program I've seen so far. In addition to it's contents you can have it also match owner/group, name, timestamp, mode. The only feature I have not yet found a use for is minimizing the link count. Regards, Christoph, who hopes to have things made a bit more clear rather than having annoyed you with this extremely long text. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) iQIcBAEBAgAGBQJJVsFuAAoJEP0O2oBXmANfFR0QAJFcLNrNfPXofyzet+QooFPa W5WuDZOoHIwKa9v5V6JkAaJbAJdrzJUjvvQPRuGlD1cxDN0cX35LptGQXOmT75p1 Q6EHk9soPfW4oCTllmjBBia8+khjm1NMDoW9IEq68TWKLdj0Byq/Wh1GY/8DmRBR kWqSYievqiNZRtf57SO57izgsecNbcAyNVcsO1hsjH8qCFQJgOgG3VpgQPNw40N7 pRXkjG/SsdocKOoGfhhHAlkTx7osVCXHJMBqp6bcaCE4Juav/IK+vSm2gqRhuLxK uRRDFYbf447L4bQcgL4NQCWzQ+/j9+XCFPanjGdjnqMbbMFBDhZdkiX3PzTEZMK2 W5hX2LHtSBA4vLIC58hOQxAjWWZvjZg3/TQufNUTPriySJGrsbglLegY8xEKH0cn RYlA2d9bs2+tdBbnSajK+O+TzTXk7WsCT7xApZAm4UrfbO5Otg4ykOry5L8uXCim
Bug#509685: ITP: hardlink -- Hardlink multiple copies of the same file
John Goerzen, le Fri 26 Dec 2008 08:54:53 -0600, a écrit : Samuel Thibault wrote: John Goerzen, le Thu 25 Dec 2008 12:04:43 -0600, a écrit : Julian Andres Klode wrote: Hardlink is a tool which detects multiple copies of the same file and replaces them with hardlinks. . The idea has been taken from http://code.google.com/p/hardlinkpy/, but the code has been written from scratch and licensed under the MIT license. Do we really need another tool like this? We already have these packages: perforate Nothing to do with this. Eh? It can do exactly what #284274 requests in fdupes below. How is that not relevant? Oh, sorry, I hadn't seen the « Also there are some scripts that help cleaning up the hard disk » I don't see why that belongs to the perforate package. Samuel -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#509685: ITP: hardlink -- Hardlink multiple copies of the same file
Julian Andres Klode wrote: On Thu, Dec 25, 2008 at 12:04:43PM -0600, John Goerzen wrote: Julian Andres Klode wrote: Hardlink is a tool which detects multiple copies of the same file and replaces them with hardlinks. . The idea has been taken from http://code.google.com/p/hardlinkpy/, but the code has been written from scratch and licensed under the MIT license. Do we really need another tool like this? We already have these packages: fdupes perforate AFAIK, they do not replace files, they just find them. That's not correct. From the manpage of finddup, part of perforate: -l, --link link the identical files together Plus a host of tools that do backups, datapacker that packs things onto DVDs, and the like, using hard links. hardlink can be used to link files in multiple backup trees, and also features options to maximize/minimize the link count, and much more. Imagine you have two backups, each on a different filesystem. Now you want to have them both on one filesystem. In this situation, you can use hardlink to link all common files in the backups together. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#509685: ITP: hardlink -- Hardlink multiple copies of the same file
Samuel Thibault wrote: John Goerzen, le Thu 25 Dec 2008 12:04:43 -0600, a écrit : Julian Andres Klode wrote: Hardlink is a tool which detects multiple copies of the same file and replaces them with hardlinks. . The idea has been taken from http://code.google.com/p/hardlinkpy/, but the code has been written from scratch and licensed under the MIT license. Do we really need another tool like this? We already have these packages: perforate Nothing to do with this. Eh? It can do exactly what #284274 requests in fdupes below. How is that not relevant? fdupes That one would be an argument for hardlink is a duplicate _if_ #284274 was fixed. Else, hardlink is really useful. Samuel -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#509685: ITP: hardlink -- Hardlink multiple copies of the same file
John Goerzen, le Thu 25 Dec 2008 12:04:43 -0600, a écrit : Julian Andres Klode wrote: Hardlink is a tool which detects multiple copies of the same file and replaces them with hardlinks. . The idea has been taken from http://code.google.com/p/hardlinkpy/, but the code has been written from scratch and licensed under the MIT license. Do we really need another tool like this? We already have these packages: perforate Nothing to do with this. fdupes That one would be an argument for hardlink is a duplicate _if_ #284274 was fixed. Else, hardlink is really useful. Samuel -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#509685: ITP: hardlink -- Hardlink multiple copies of the same file
On Fri, Dec 26, 2008 at 13:23, Julian Andres Klode j...@debian.org wrote: On Thu, Dec 25, 2008 at 12:04:43PM -0600, John Goerzen wrote: Julian Andres Klode wrote: Hardlink is a tool which detects multiple copies of the same file and replaces them with hardlinks. . The idea has been taken from http://code.google.com/p/hardlinkpy/, but the code has been written from scratch and licensed under the MIT license. Do we really need another tool like this? We already have these packages: fdupes perforate AFAIK, they do not replace files, they just find them. I'd happily accept patches for #284274 Cheers, -- Sandro Tosi (aka morph, Morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#509685: ITP: hardlink -- Hardlink multiple copies of the same file
On Thu, Dec 25, 2008 at 12:04:43PM -0600, John Goerzen wrote: Julian Andres Klode wrote: Hardlink is a tool which detects multiple copies of the same file and replaces them with hardlinks. . The idea has been taken from http://code.google.com/p/hardlinkpy/, but the code has been written from scratch and licensed under the MIT license. Do we really need another tool like this? We already have these packages: fdupes perforate AFAIK, they do not replace files, they just find them. Plus a host of tools that do backups, datapacker that packs things onto DVDs, and the like, using hard links. hardlink can be used to link files in multiple backup trees, and also features options to maximize/minimize the link count, and much more. Imagine you have two backups, each on a different filesystem. Now you want to have them both on one filesystem. In this situation, you can use hardlink to link all common files in the backups together. -- Julian Andres Klode - Free Software Developer Debian Developer - Contributing Member of SPI Ubuntu Member - Fellow of FSFE Website: http://jak-linux.org/ XMPP: juli...@jabber.org Debian: http://www.debian.org/ SPI: http://www.spi-inc.org/ Ubuntu: http://www.ubuntu.com/ FSFE: http://www.fsfe.org/ -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#509685: ITP: hardlink -- Hardlink multiple copies of the same file
Julian Andres Klode wrote: Hardlink is a tool which detects multiple copies of the same file and replaces them with hardlinks. . The idea has been taken from http://code.google.com/p/hardlinkpy/, but the code has been written from scratch and licensed under the MIT license. Do we really need another tool like this? We already have these packages: fdupes perforate Plus a host of tools that do backups, datapacker that packs things onto DVDs, and the like, using hard links. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#509685: ITP: hardlink -- Hardlink multiple copies of the same file
Package: wnpp Severity: wishlist Owner: Julian Andres Klode j...@debian.org * Package name: hardlink Version : 0.1 Upstream Author : Julian Andres Klode j...@jak-linux.org * URL : http://git.debian.org/?p=users/jak/hardlink.git;a=summary * License : MIT * Programming Lang: Python (= 2.5) Description : Hardlink multiple copies of the same file Hardlink is a tool which detects multiple copies of the same file and replaces them with hardlinks. . The idea has been taken from http://code.google.com/p/hardlinkpy/, but the code has been written from scratch and licensed under the MIT license. -- Further information: I chose to rewrite hardlinkpy from scratch because I did not like the style of it and because upstream seems to have lost interest in it. This can be seen as my response to the question of chf on #debian-devel.de, about packaging hardlinkpy (a few days ago). -- System Information: Debian Release: 5.0 APT prefers testing APT policy: (990, 'testing'), (500, 'unstable'), (200, 'experimental') Architecture: amd64 (x86_64) -- Julian Andres Klode - Free Software Developer Debian Developer - Contributing Member of SPI Ubuntu Member - Fellow of FSFE Website: http://jak-linux.org/ XMPP: juli...@jabber.org Debian: http://www.debian.org/ SPI: http://www.spi-inc.org/ Ubuntu: http://www.ubuntu.com/ FSFE: http://www.fsfe.org/ signature.asc Description: Digital signature