Re: Bug#656142: ITP: duff -- Duplicate file finder

2012-01-19 Thread Kamal Mostafa
On Mon, Jan 16, 2012 at 12:58:13PM -0800, Kamal Mostafa wrote: * Package name: duff * URL : http://duff.sourceforge.net/ On Tue, 2012-01-17 at 09:56 +0100, Simon Josefsson wrote: If there aren't warnings about use of SHA1 in the tool, there should be. While I don't recall

Re: Bug#656142: ITP: duff -- Duplicate file finder

2012-01-17 Thread Lars Wirzenius
On Mon, Jan 16, 2012 at 12:58:13PM -0800, Kamal Mostafa wrote: * Package name: duff * URL : http://duff.sourceforge.net/ A quick speed comparison: real user system max RSS elapsed cmd (s) (s) (s)(KiB) (s)

Re: Bug#656142: ITP: duff -- Duplicate file finder

2012-01-17 Thread Samuel Thibault
Lars Wirzenius, le Tue 17 Jan 2012 09:12:58 +, a écrit : real user system max RSS elapsed cmd (s) (s) (s)(KiB) (s) 3.2 2.4 5.862784 5.8 hardlink --dry-run files /dev/null

Re: Bug#656142: ITP: duff -- Duplicate file finder

2012-01-17 Thread Lars Wirzenius
On Tue, Jan 17, 2012 at 10:30:20AM +0100, Samuel Thibault wrote: Lars Wirzenius, le Tue 17 Jan 2012 09:12:58 +, a écrit : real user system max RSS elapsed cmd (s) (s) (s)(KiB) (s) 3.2 2.4

Re: Bug#656142: ITP: duff -- Duplicate file finder

2012-01-17 Thread Samuel Thibault
Lars Wirzenius, le Tue 17 Jan 2012 10:45:20 +, a écrit : Personally, I would be wary of using checksums for file comparisons, since comparing files byte-by-byte isn't slow (you only need to do it to files that are identical in size, and you need to read all the files anyway).

Re: Bug#656142: ITP: duff -- Duplicate file finder

2012-01-17 Thread Samuel Thibault
Lars Wirzenius, le Tue 17 Jan 2012 10:45:20 +, a écrit : On Tue, Jan 17, 2012 at 10:30:20AM +0100, Samuel Thibault wrote: Lars Wirzenius, le Tue 17 Jan 2012 09:12:58 +, a écrit : real user system max RSS elapsed cmd (s) (s) (s)

Re: Bug#656142: ITP: duff -- Duplicate file finder

2012-01-17 Thread Roland Mas
Samuel Thibault, 2012-01-17 12:03:41 +0100 : [...] I'm not sure to understand what you mean exactly. If you have even just a hundred files of the same size, you will need ten thousand file comparisons! I'm sure that can be optimised. Read all 100 files in parallel, comparing blocks of

Re: Bug#656142: ITP: duff -- Duplicate file finder

2012-01-17 Thread Samuel Thibault
Samuel Thibault, le Tue 17 Jan 2012 12:15:16 +0100, a écrit : Lars Wirzenius, le Tue 17 Jan 2012 10:45:20 +, a écrit : On Tue, Jan 17, 2012 at 10:30:20AM +0100, Samuel Thibault wrote: Lars Wirzenius, le Tue 17 Jan 2012 09:12:58 +, a écrit : real user system max RSS elapsed

Re: Bug#656142: ITP: duff -- Duplicate file finder

2012-01-17 Thread Samuel Thibault
Roland Mas, le Tue 17 Jan 2012 13:41:23 +0100, a écrit : Samuel Thibault, 2012-01-17 12:03:41 +0100 : [...] I'm not sure to understand what you mean exactly. If you have even just a hundred files of the same size, you will need ten thousand file comparisons! I'm sure that can be

Re: Bug#656142: ITP: duff -- Duplicate file finder

2012-01-17 Thread Lars Wirzenius
On Tue, Jan 17, 2012 at 02:05:10PM +0100, Samuel Thibault wrote: Roland Mas, le Tue 17 Jan 2012 13:41:23 +0100, a écrit : Samuel Thibault, 2012-01-17 12:03:41 +0100 : [...] I'm not sure to understand what you mean exactly. If you have even just a hundred files of the same size,

Re: Bug#656142: ITP: duff -- Duplicate file finder

2012-01-17 Thread Samuel Thibault
Samuel Thibault, le Tue 17 Jan 2012 14:02:45 +0100, a écrit : On my PhD work directory, with various stuff in it (500MiB, 18000 files, big but also small files (svn/git checkouts etc)), everything being in cache already (no disk I/O): hardlink -t --dry-run . /dev/null 1,06s user 0,46s

Re: Bug#656142: ITP: duff -- Duplicate file finder

2012-01-17 Thread Johan Henriksson
Ah, right. So you'll start writing yet another tool? ;) I've implemented pretty much that (http://liw.fi/dupfiles), but my duplicate file finder is not so much better than existing ones in Debian that I would inflict it on Debian. But the algorithm works nicely, and works even for people

Re: Bug#656142: ITP: duff -- Duplicate file finder

2012-01-17 Thread Andy Smith
Hello, On Tue, Jan 17, 2012 at 09:12:58AM +, Lars Wirzenius wrote: rdfind seems to be quickest one, but duff compares well with hardlink, which (see http://liw.fi/dupfiles/) was the fastest one I knew of in Debian so far. Does anyone know of a duplicate file finder that can keep its

Bug#656142: ITP: duff -- Duplicate file finder

2012-01-16 Thread Kamal Mostafa
Package: wnpp Severity: wishlist Owner: Kamal Mostafa ka...@whence.com * Package name: duff Version : 0.5 Upstream Author : Camilla Berglund elmindr...@elmindreda.org * URL : http://duff.sourceforge.net/ * License : Zlib Programming Lang: C Description

Re: Bug#656142: ITP: duff -- Duplicate file finder

2012-01-16 Thread Samuel Thibault
Kamal Mostafa, le Mon 16 Jan 2012 12:58:13 -0800, a écrit : Package: wnpp Severity: wishlist Owner: Kamal Mostafa ka...@whence.com * Package name: duff Version : 0.5 Upstream Author : Camilla Berglund elmindr...@elmindreda.org * URL :

Re: Bug#656142: ITP: duff -- Duplicate file finder

2012-01-16 Thread Axel Beckert
Hi, Samuel Thibault wrote: * Package name: duff Version : 0.5 Upstream Author : Camilla Berglund elmindr...@elmindreda.org * URL : http://duff.sourceforge.net/ * License : Zlib Programming Lang: C Description : Duplicate file finder

Re: Bug#656142: ITP: duff -- Duplicate file finder

2012-01-16 Thread Joerg Jaspert
What is it the benefit over fdupes, rdfind, ...? ..., hardlink, ... finddup from perforate Was thinking about packaging it myself already, so I may also sponsor Kamal's package when it's ready. You just listed the third duplicate (and me no. 4), and still go blind right on ohoh, i sponsor

Re: Bug#656142: ITP: duff -- Duplicate file finder

2012-01-16 Thread Kamal Mostafa
On Mon, 2012-01-16 at 23:07 +0100, Joerg Jaspert wrote: What is it the benefit over fdupes, rdfind, ...? ..., hardlink, ... finddup from perforate After a quick evaluation of the various find dupe files tools, I was attracted to try duff because: 1. It looked easier to use than the others.

Re: Bug#656142: ITP: duff -- Duplicate file finder

2012-01-16 Thread martin f krafft
also sprach Kamal Mostafa ka...@debian.org [2012.01.17.0049 +0100]: In my humble opinion, that would be an unreasonable pre-condition for inclusion in Debian. Our standard for inclusion should not be that a new package must be vastly better than other similar packages. That would deny a new