We really need a better way to mark duplicates on Commons (and images that are details from a larger work). A structure to record this is something that probably ought to be on the radar for the new Structured Data project.

As well as exact duplicates, there may often also be different versions of the same painting with different lighting, or scans of slightly different reproductions of the same work. I don't know whether the algorithm is permissive enough to pick all of these up, but as many as can be picked up would be good to tag as "other versions" of the same underlying image.

In general, we probably wouldn't *remove* duplicate images, but we would want to identify them as versions of each other.

All best,

   James.


On 04/12/2014 08:25, Federico Leva (Nemo) wrote:
Jonas Öberg, 04/12/2014 08:31:
In our work with Elog.io[1], we've come across a number of duplicate
files in Commons.

Great!

Some of them are explainable, such as PNGs which
also have a thumbnail as JPG[2], but others seem to be more clear-cut
duplicated uploads, like [3] and [4], and yet others are the same work
but different sizes like [5] and [6].

Are most of the case you find perfect duplicates like these?


Going through this is quite an effort, and likely requires a bit of
manual work. Is there any organised structure/group of people, that
deal with duplicate works? We'd love to contribute our findings to
such an effort once we clean up our data a bit.

Sure. You can edit the files and add
https://commons.wikimedia.org/wiki/Template:Duplicate
If you need to report many thousands files, it may be better to use a
flagged bot account:
https://commons.wikimedia.org/wiki/Commons:Bots/Requests

Nemo

_______________________________________________
Commons-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/commons-l


_______________________________________________
Commons-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/commons-l

Reply via email to