How about program fdupes? If you don't already have it you can take the source 
(https://github.com/adrianlopezroche/fdupes) and compile it. It's a pretty 
straight forward compile.

mylinux:~ # echo abc > file1
mylinux:~ # echo abc > file2
mylinux:~ # fdupes .
./file1
./file2

mylinux:~ #

It can also scan across subdirectories. I use it frequently, both on x86 and 
s390.

Met vriendelijke groet/With kind regards/Mit freundlichen Grüßen,
Berry van Sleeuwen


-----Original Message-----
From: Linux on 390 Port [mailto:[email protected]] On Behalf Of John 
McKown
Sent: Wednesday, July 06, 2016 3:35 PM
To: [email protected]
Subject: "clever"(?) way to find files with duplicate contents.

I have a directory which has a number of files in it. I want to find out which 
files have identical content. Please, don't ask why (I'm an idiot?).
Since these are text files, my first thought was to use diff. That is, list the 
files. For each file, do a diff against all the other files and note the 
result. I never came up with a decent algorithm to do this. Then I had a 
"vision". I remember that git stores file contents by basically creating a 
sha1sum, which it uses as a file name. Multiple files with the same sha1sum 
(which very likely to be unique based on the content) are only stored one. Now, 
since sha1sum is very unlikely to have a collision, how likely would sha512sum 
be to have a collision. So I did the following:

for i in *;do x=$(sha512sum "$i" | cut -d ' ' -f 1);echo "$i"
>>"${x}.sha512sum";done

I then did:

wc -l *.sha512sum | head -n -1 | awk '$1 != 1 {print $2;}'|while read i;do echo 
'===';cat $i;done

which gave me a nice list of files with each group separated by ===.

Is this reasonable? Is there a better way to do this?

--
"Pessimism is a admirable quality in an engineer. Pessimistic people check 
their work three times, because they're sure that something won't be right.
Optimistic people check once, trust in Solis-de to keep the ship safe, then 
blow everyone up."
"I think you're mistaking the word optimistic for inept."
"They've got a similar ring to my ear."

From "Star Nomad" by Lindsay Buroker:

Maranatha! <><
John McKown

----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions, send email to 
[email protected] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
----------------------------------------------------------------------
For more information on Linux on System z, visit http://wiki.linuxvm.org/
This e-mail and the documents attached are confidential and intended solely for 
the addressee; it may also be privileged. If you receive this e-mail in error, 
please notify the sender immediately and destroy it. As its integrity cannot be 
secured on the Internet, Atos’ liability cannot be triggered for the message 
content. Although the sender endeavours to maintain a computer virus-free 
network, the sender does not warrant that this transmission is virus-free and 
will not be liable for any damages resulting from any virus transmitted. On all 
offers and agreements under which Atos Nederland B.V. supplies goods and/or 
services of whatever nature, the Terms of Delivery from Atos Nederland B.V. 
exclusively apply. The Terms of Delivery shall be promptly submitted to you on 
your request.

Reply via email to