Hi, I'd recommend checking http://manpages.ubuntu.com/manpages/precise/man1/fdupes.1.html instead of coding for your own.
Christian Ehrhardt Software Engineer, Ubuntu Server Canonical Ltd On Wed, Jul 6, 2016 at 3:35 PM, John McKown <[email protected]> wrote: > I have a directory which has a number of files in it. I want to find out > which files have identical content. Please, don't ask why (I'm an idiot?). > Since these are text files, my first thought was to use diff. That is, list > the files. For each file, do a diff against all the other files and note > the result. I never came up with a decent algorithm to do this. Then I had > a "vision". I remember that git stores file contents by basically creating > a sha1sum, which it uses as a file name. Multiple files with the same > sha1sum (which very likely to be unique based on the content) are only > stored one. Now, since sha1sum is very unlikely to have a collision, how > likely would sha512sum be to have a collision. So I did the following: > > for i in *;do x=$(sha512sum "$i" | cut -d ' ' -f 1);echo "$i" > >>"${x}.sha512sum";done > > I then did: > > wc -l *.sha512sum | head -n -1 | awk '$1 != 1 {print $2;}'|while read i;do > echo '===';cat $i;done > > which gave me a nice list of files with each group separated by ===. > > Is this reasonable? Is there a better way to do this? > > -- > "Pessimism is a admirable quality in an engineer. Pessimistic people check > their work three times, because they're sure that something won't be right. > Optimistic people check once, trust in Solis-de to keep the ship safe, then > blow everyone up." > "I think you're mistaking the word optimistic for inept." > "They've got a similar ring to my ear." > > From "Star Nomad" by Lindsay Buroker: > > Maranatha! <>< > John McKown > > ---------------------------------------------------------------------- > For LINUX-390 subscribe / signoff / archive access instructions, > send email to [email protected] with the message: INFO LINUX-390 or > visit > http://www.marist.edu/htbin/wlvindex?LINUX-390 > ---------------------------------------------------------------------- > For more information on Linux on System z, visit > http://wiki.linuxvm.org/ > ---------------------------------------------------------------------- For LINUX-390 subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 ---------------------------------------------------------------------- For more information on Linux on System z, visit http://wiki.linuxvm.org/
