Would it be possible to split the list into images that are

* byte-for-byte identical
* very different sizes (eg > x2 difference -- this is often intentional, especially for large tiffs).
* others ?

I think this would be useful.

It would also be useful to do some further processing to identify images which, though probably related, are *not* in fact duplicates, eg due to a notable difference somewhere (eg arrows or legend added, or a difference in some local blocks of colour, eg:

https://commons.wikimedia.org/wiki/File:Map_-_NL_-_Putten_-_Wijk_00_Putten_-_Buurt_01_Putten-Zuid-Oost.svg

https://commons.wikimedia.org/wiki/File:Map_-_NL_-_Putten_-_Wijk_00_Putten_-_Buurt_03_Putten-Zuid-West.svg


-- James.




On 04/12/2014 09:44, Jonas Öberg wrote:
Hi Federico, and others,

Are most of the case you find perfect duplicates like these?

I'm still running the comparison, but I made a first list of ~500
duplicate works available here:

    http://belar.coyote.org/~jonas/wmcdups.html

It would be very useful to get some feedback on that. Looking through
some of those will give an idea of the kind of "duplicates" we find.


Sincerely,
Jonas

_______________________________________________
Commons-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/commons-l



_______________________________________________
Commons-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/commons-l

Reply via email to