> > > > Message: 4 > > Date: Thu, 4 Dec 2014 14:58:37 -0500 > > From: "Sreejith K." <[email protected]> > > To: Wikimedia Commons Discussion List <[email protected]> > > Subject: Re: [Commons-l] Duplicate removal? > > Message-ID: > > <CAN8yy7Mtte+FPJ5N=hq= [email protected]> > > Content-Type: text/plain; charset="utf-8" > > > > I am using Wikimedia APIs to create a gallery of duplicates and routinely > > clean them. You can see the results here. > > > > https://commons.wikimedia.org/wiki/User:Sreejithk2000/Duplicates > > > > The page also has a link to the script. If anyone is interested in using > > this script, let me know and I can work with you to customize it. > > > > - Sreejith K. > > > > > See also https://commons.wikimedia.org/wiki/Special:ListDuplicatedFiles which lists files that have the most byte for byte duplicates (really most of the time those should use file redirects).
-- Thanks Jonas for experimenting with this sort of thing. I always wished we did something with preceptual hashes internally in addition to the sha1 hashes we do currently. --bawolff
_______________________________________________ Commons-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/commons-l
