Hi,
I'd recommend checking
http://manpages.ubuntu.com/manpages/precise/man1/fdupes.1.html instead of
coding for your own.

Christian Ehrhardt
Software Engineer, Ubuntu Server
Canonical Ltd

On Wed, Jul 6, 2016 at 3:35 PM, John McKown <[email protected]>
wrote:

> I have a directory which has a number of files in it. I want to find out
> which files have identical content. Please, don't ask why (I'm an idiot?).
> Since these are text files, my first thought was to use diff. That is, list
> the files. For each file, do a diff against all the other files and note
> the result. I never came up with a decent algorithm to do this. Then I had
> a "vision". I remember that git stores file contents by basically creating
> a sha1sum, which it uses as a file name. Multiple files with the same
> sha1sum (which very likely to be unique based on the content) are only
> stored one. Now, since sha1sum is very unlikely to have a collision, how
> likely would sha512sum be to have a collision. So I did the following:
>
> for i in *;do x=$(sha512sum "$i" | cut -d ' ' -f 1);echo "$i"
> >>"${x}.sha512sum";done
>
> I then did:
>
> wc -l *.sha512sum | head -n -1 | awk '$1 != 1 {print $2;}'|while read i;do
> echo '===';cat $i;done
>
> which gave me a nice list of files with each group separated by ===.
>
> Is this reasonable? Is there a better way to do this?
>
> --
> "Pessimism is a admirable quality in an engineer. Pessimistic people check
> their work three times, because they're sure that something won't be right.
> Optimistic people check once, trust in Solis-de to keep the ship safe, then
> blow everyone up."
> "I think you're mistaking the word optimistic for inept."
> "They've got a similar ring to my ear."
>
> From "Star Nomad" by Lindsay Buroker:
>
> Maranatha! <><
> John McKown
>
> ----------------------------------------------------------------------
> For LINUX-390 subscribe / signoff / archive access instructions,
> send email to [email protected] with the message: INFO LINUX-390 or
> visit
> http://www.marist.edu/htbin/wlvindex?LINUX-390
> ----------------------------------------------------------------------
> For more information on Linux on System z, visit
> http://wiki.linuxvm.org/
>

----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
----------------------------------------------------------------------
For more information on Linux on System z, visit
http://wiki.linuxvm.org/

Reply via email to