How about program fdupes? If you don't already have it you can take the source (https://github.com/adrianlopezroche/fdupes) and compile it. It's a pretty straight forward compile.
mylinux:~ # echo abc > file1 mylinux:~ # echo abc > file2 mylinux:~ # fdupes . ./file1 ./file2 mylinux:~ # It can also scan across subdirectories. I use it frequently, both on x86 and s390. Met vriendelijke groet/With kind regards/Mit freundlichen Grüßen, Berry van Sleeuwen -----Original Message----- From: Linux on 390 Port [mailto:[email protected]] On Behalf Of John McKown Sent: Wednesday, July 06, 2016 3:35 PM To: [email protected] Subject: "clever"(?) way to find files with duplicate contents. I have a directory which has a number of files in it. I want to find out which files have identical content. Please, don't ask why (I'm an idiot?). Since these are text files, my first thought was to use diff. That is, list the files. For each file, do a diff against all the other files and note the result. I never came up with a decent algorithm to do this. Then I had a "vision". I remember that git stores file contents by basically creating a sha1sum, which it uses as a file name. Multiple files with the same sha1sum (which very likely to be unique based on the content) are only stored one. Now, since sha1sum is very unlikely to have a collision, how likely would sha512sum be to have a collision. So I did the following: for i in *;do x=$(sha512sum "$i" | cut -d ' ' -f 1);echo "$i" >>"${x}.sha512sum";done I then did: wc -l *.sha512sum | head -n -1 | awk '$1 != 1 {print $2;}'|while read i;do echo '===';cat $i;done which gave me a nice list of files with each group separated by ===. Is this reasonable? Is there a better way to do this? -- "Pessimism is a admirable quality in an engineer. Pessimistic people check their work three times, because they're sure that something won't be right. Optimistic people check once, trust in Solis-de to keep the ship safe, then blow everyone up." "I think you're mistaking the word optimistic for inept." "They've got a similar ring to my ear." From "Star Nomad" by Lindsay Buroker: Maranatha! <>< John McKown ---------------------------------------------------------------------- For LINUX-390 subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 ---------------------------------------------------------------------- For more information on Linux on System z, visit http://wiki.linuxvm.org/ This e-mail and the documents attached are confidential and intended solely for the addressee; it may also be privileged. If you receive this e-mail in error, please notify the sender immediately and destroy it. As its integrity cannot be secured on the Internet, Atos’ liability cannot be triggered for the message content. Although the sender endeavours to maintain a computer virus-free network, the sender does not warrant that this transmission is virus-free and will not be liable for any damages resulting from any virus transmitted. On all offers and agreements under which Atos Nederland B.V. supplies goods and/or services of whatever nature, the Terms of Delivery from Atos Nederland B.V. exclusively apply. The Terms of Delivery shall be promptly submitted to you on your request.
