Bug#509685: ITP: hardlink -- Hardlink multiple copies of the same file

2008-12-27 Thread Joerg Jaspert

 We already have these packages:
   fdupes
   perforate
 AFAIK, they do not replace files, they just find them.

Wrong, fdupe from perforate does link them together if one wants to.

 Imagine you have two backups, each on a different filesystem. Now you
 want to have them both on one filesystem. In this situation, you can use
 hardlink to link all common files in the backups together.

Or you just use rsync with --link-dest and don't need a second tool for
it at all.

-- 
bye, Joerg
That's just f***ing great, now the bar for being a cool guy in free
software just got raised. It used to be you just had to write a million
lines of useful code. Now you've got to get a subpoena from SCO to be cool.



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#509685: ITP: hardlink -- Hardlink multiple copies of the same file

2008-12-27 Thread ChristophFranzen
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello,

I am the chf who requested the program on IRC in the first place, and
noticing the serious doubts expressed here whether that package is
really needed in Debian, I want to point out why I need it nonetheless,
and, more important here, why I think it would enrich Debian as
well.

I am not using the hardlink utility in spite of rsync --link-dest
already existing, I need it *because* I use this rsync feature for
backups.

More precisely, I am using rsnapshot which essentially allows holding
several generations of a backup tree, and conserving huge amounts of
space by connectig unchanged files with rsync --link-dest to the
previous generation.

However, I experienced a bug in rsnapshot causing a gap whenever one
host was down if a backup was scheduled in the meantime. Afterwards the
whole trees were unconnected again. The issue seems not to be easily
reproducible, so it's unresolved despite of my report on the rsnapshot
mailing list.

On my search for a remedy I came across several people who had similar
problems repairing broken apart rsnapshot mirrors, and
they evatually directed me towards hardlink.py which simply looks for
equal files in the set of given directories, linking any equal files
together.

This saves even more space as rsnapshot on its own, because *all*
equal files are linked, not only those with the same path and filename
from one backup generation to the immediate successor.

At a first glance, it seems that this is all the functionality needed,
so you could use the perforate package (where the feature is hidden
enough for me having not been able to find it) or enhance
fdupe as requested in #284274.

Because I've got many subtrees with a great total number of files,
letting one of the existing utilities run over the whole fileset will
last several weeks because the machine begins swapping.

So I need to split this into several runs over a subset of the trees.
These are to be handled independently in a way that none of the runs
knows anything of ht other ones. You can only achieve this by
maximizing the link count, thus replacing the file which is linked
to less other files with a hard link to the one which already has got a
higher link count.

I found a reference to such an enhancement to hardlink.py in a
mailing list and requested the patch. Several weeks later, I received
it from it's author via email. This tool was the first one which
really did what I needed.

After this odyssey, I decided that it would be a wasted effort to use
this script only on my own system, because I had read from others with
similar problems. I use Debian since a few years, and it has become my
favourite Distribution, so I considered creating a new package or
finding somebody more experienced who would do this. So I started
asking around at #debian-devel.de on OFTC, and Julian Andres Klode
offered to rewrite this program and make it available in Debian.

I'm not sure if I could explain the maximize link count feature
undestandably, so here is one more example:

Imagine you create several identical backup trees with only small
changes between them with rsync --link-dest. Afterwards you move a
large (measured in file size) subtree to naother location in the
source. The next rsync run will copy this subtree again, but not link
it to the unchanged files of the previous run, because their
relative location has changed and it cannot identify them as being the
same. Now you run the hardlink utility over the new copy and *one*
of the older trees (its immediate predecessor) due to memory
constraints. If you have got a normal hardlink utility, there is no
way to make sure that indeed every equal file of the new set is
replaced by a link to the older one. If you are unlucky, it will just
break off the predecessor tree from all the oder trees and connect it
to the new one resulting in no space gain from this operation.

Additionally, the hardlink utility gives more control about what files
are considered equal than any other similar program I've seen so far.
In addition to it's contents you can have it also match owner/group,
name, timestamp, mode. The only feature I have not yet found a use for
is minimizing the link count.

Regards, Christoph, who hopes to have things made a bit more clear
rather than having annoyed you with this extremely long text.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)

iQIcBAEBAgAGBQJJVsFuAAoJEP0O2oBXmANfFR0QAJFcLNrNfPXofyzet+QooFPa
W5WuDZOoHIwKa9v5V6JkAaJbAJdrzJUjvvQPRuGlD1cxDN0cX35LptGQXOmT75p1
Q6EHk9soPfW4oCTllmjBBia8+khjm1NMDoW9IEq68TWKLdj0Byq/Wh1GY/8DmRBR
kWqSYievqiNZRtf57SO57izgsecNbcAyNVcsO1hsjH8qCFQJgOgG3VpgQPNw40N7
pRXkjG/SsdocKOoGfhhHAlkTx7osVCXHJMBqp6bcaCE4Juav/IK+vSm2gqRhuLxK
uRRDFYbf447L4bQcgL4NQCWzQ+/j9+XCFPanjGdjnqMbbMFBDhZdkiX3PzTEZMK2
W5hX2LHtSBA4vLIC58hOQxAjWWZvjZg3/TQufNUTPriySJGrsbglLegY8xEKH0cn
RYlA2d9bs2+tdBbnSajK+O+TzTXk7WsCT7xApZAm4UrfbO5Otg4ykOry5L8uXCim

Bug#509685: ITP: hardlink -- Hardlink multiple copies of the same file

2008-12-26 Thread Samuel Thibault
John Goerzen, le Fri 26 Dec 2008 08:54:53 -0600, a écrit :
 Samuel Thibault wrote:
  John Goerzen, le Thu 25 Dec 2008 12:04:43 -0600, a écrit :
  Julian Andres Klode wrote:
   Hardlink is a tool which detects multiple copies of the same file and 
  replaces
   them with hardlinks.
   .
   The idea has been taken from http://code.google.com/p/hardlinkpy/, but 
  the
   code has been written from scratch and licensed under the MIT license.
  Do we really need another tool like this?
 
  We already have these packages:
 
perforate
  
  Nothing to do with this.
 
 Eh?  It can do exactly what #284274 requests in fdupes below.  How is
 that not relevant?

Oh, sorry, I hadn't seen the
« Also there are some scripts that help cleaning up the hard disk »
I don't see why that belongs to the perforate package.

Samuel



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#509685: ITP: hardlink -- Hardlink multiple copies of the same file

2008-12-26 Thread John Goerzen
Julian Andres Klode wrote:
 On Thu, Dec 25, 2008 at 12:04:43PM -0600, John Goerzen wrote:
 Julian Andres Klode wrote:
  Hardlink is a tool which detects multiple copies of the same file and 
 replaces
  them with hardlinks.
  .
  The idea has been taken from http://code.google.com/p/hardlinkpy/, but the
  code has been written from scratch and licensed under the MIT license.
 Do we really need another tool like this?

 We already have these packages:

   fdupes
   perforate
 AFAIK, they do not replace files, they just find them.

That's not correct.  From the manpage of finddup, part of perforate:

   -l, --link
  link the identical files together

 Plus a host of tools that do backups, datapacker that packs things onto
 DVDs, and the like, using hard links.
 hardlink can be used to link files in multiple backup trees,
 and also features options to maximize/minimize the link count,
 and much more.
 
 Imagine you have two backups, each on a different filesystem. Now you
 want to have them both on one filesystem. In this situation, you can use
 hardlink to link all common files in the backups together.
 




-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#509685: ITP: hardlink -- Hardlink multiple copies of the same file

2008-12-26 Thread John Goerzen
Samuel Thibault wrote:
 John Goerzen, le Thu 25 Dec 2008 12:04:43 -0600, a écrit :
 Julian Andres Klode wrote:
  Hardlink is a tool which detects multiple copies of the same file and 
 replaces
  them with hardlinks.
  .
  The idea has been taken from http://code.google.com/p/hardlinkpy/, but the
  code has been written from scratch and licensed under the MIT license.
 Do we really need another tool like this?

 We already have these packages:

   perforate
 
 Nothing to do with this.

Eh?  It can do exactly what #284274 requests in fdupes below.  How is
that not relevant?

 
   fdupes
 
 That one would be an argument for hardlink is a duplicate _if_ #284274
 was fixed.  Else, hardlink is really useful.
 
 Samuel
 




-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#509685: ITP: hardlink -- Hardlink multiple copies of the same file

2008-12-26 Thread Samuel Thibault
John Goerzen, le Thu 25 Dec 2008 12:04:43 -0600, a écrit :
 Julian Andres Klode wrote:
   Hardlink is a tool which detects multiple copies of the same file and 
  replaces
   them with hardlinks.
   .
   The idea has been taken from http://code.google.com/p/hardlinkpy/, but the
   code has been written from scratch and licensed under the MIT license.
 
 Do we really need another tool like this?
 
 We already have these packages:
 
   perforate

Nothing to do with this.

   fdupes

That one would be an argument for hardlink is a duplicate _if_ #284274
was fixed.  Else, hardlink is really useful.

Samuel



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#509685: ITP: hardlink -- Hardlink multiple copies of the same file

2008-12-26 Thread Sandro Tosi
On Fri, Dec 26, 2008 at 13:23, Julian Andres Klode j...@debian.org wrote:
 On Thu, Dec 25, 2008 at 12:04:43PM -0600, John Goerzen wrote:
 Julian Andres Klode wrote:
 
   Hardlink is a tool which detects multiple copies of the same file and 
  replaces
   them with hardlinks.
   .
   The idea has been taken from http://code.google.com/p/hardlinkpy/, but the
   code has been written from scratch and licensed under the MIT license.

 Do we really need another tool like this?

 We already have these packages:

   fdupes
   perforate
 AFAIK, they do not replace files, they just find them.

I'd happily accept patches for #284274

Cheers,
-- 
Sandro Tosi (aka morph, Morpheus, matrixhasu)
My website: http://matrixhasu.altervista.org/
Me at Debian: http://wiki.debian.org/SandroTosi



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#509685: ITP: hardlink -- Hardlink multiple copies of the same file

2008-12-26 Thread Julian Andres Klode
On Thu, Dec 25, 2008 at 12:04:43PM -0600, John Goerzen wrote:
 Julian Andres Klode wrote:
  
   Hardlink is a tool which detects multiple copies of the same file and 
  replaces
   them with hardlinks.
   .
   The idea has been taken from http://code.google.com/p/hardlinkpy/, but the
   code has been written from scratch and licensed under the MIT license.
 
 Do we really need another tool like this?
 
 We already have these packages:
 
   fdupes
   perforate
AFAIK, they do not replace files, they just find them.
 
 Plus a host of tools that do backups, datapacker that packs things onto
 DVDs, and the like, using hard links.
hardlink can be used to link files in multiple backup trees,
and also features options to maximize/minimize the link count,
and much more.

Imagine you have two backups, each on a different filesystem. Now you
want to have them both on one filesystem. In this situation, you can use
hardlink to link all common files in the backups together.
 

-- 
Julian Andres Klode  - Free Software Developer
   Debian Developer  - Contributing Member of SPI
   Ubuntu Member - Fellow of FSFE

Website: http://jak-linux.org/   XMPP: juli...@jabber.org
Debian:  http://www.debian.org/  SPI:  http://www.spi-inc.org/
Ubuntu:  http://www.ubuntu.com/  FSFE: http://www.fsfe.org/



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#509685: ITP: hardlink -- Hardlink multiple copies of the same file

2008-12-26 Thread John Goerzen
Julian Andres Klode wrote:
 
  Hardlink is a tool which detects multiple copies of the same file and 
 replaces
  them with hardlinks.
  .
  The idea has been taken from http://code.google.com/p/hardlinkpy/, but the
  code has been written from scratch and licensed under the MIT license.

Do we really need another tool like this?

We already have these packages:

  fdupes
  perforate

Plus a host of tools that do backups, datapacker that packs things onto
DVDs, and the like, using hard links.




-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#509685: ITP: hardlink -- Hardlink multiple copies of the same file

2008-12-24 Thread Julian Andres Klode
Package: wnpp
Severity: wishlist
Owner: Julian Andres Klode j...@debian.org

* Package name: hardlink
  Version : 0.1
  Upstream Author : Julian Andres Klode j...@jak-linux.org
* URL : http://git.debian.org/?p=users/jak/hardlink.git;a=summary
* License : MIT
* Programming Lang: Python (= 2.5)
  Description : Hardlink multiple copies of the same file

 Hardlink is a tool which detects multiple copies of the same file and replaces
 them with hardlinks.
 .
 The idea has been taken from http://code.google.com/p/hardlinkpy/, but the
 code has been written from scratch and licensed under the MIT license.

-- Further information:
I chose to rewrite hardlinkpy from scratch because I did not like the style
of it and because upstream seems to have lost interest in it.

This can be seen as my response to the question of chf on #debian-devel.de,
about packaging hardlinkpy (a few days ago).

-- System Information:
Debian Release: 5.0
  APT prefers testing
  APT policy: (990, 'testing'), (500, 'unstable'), (200, 'experimental')
Architecture: amd64 (x86_64)

-- 
Julian Andres Klode  - Free Software Developer
   Debian Developer  - Contributing Member of SPI
   Ubuntu Member - Fellow of FSFE

Website: http://jak-linux.org/   XMPP: juli...@jabber.org
Debian:  http://www.debian.org/  SPI:  http://www.spi-inc.org/
Ubuntu:  http://www.ubuntu.com/  FSFE: http://www.fsfe.org/


signature.asc
Description: Digital signature