Hi everyone, In our work with Elog.io[1], we've come across a number of duplicate files in Commons. Some of them are explainable, such as PNGs which also have a thumbnail as JPG[2], but others seem to be more clear-cut duplicated uploads, like [3] and [4], and yet others are the same work but different sizes like [5] and [6].
Going through this is quite an effort, and likely requires a bit of manual work. Is there any organised structure/group of people, that deal with duplicate works? We'd love to contribute our findings to such an effort once we clean up our data a bit. [1] http://elog.io/ [2] Like https://commons.wikimedia.org/wiki/File:Island_House,_Bellows_Falls,_by_P._W._Taft.png [3] https://commons.wikimedia.org/wiki/File:Defense.gov_News_Photo_090910-N-8420M-038.jpg [4] https://commons.wikimedia.org/wiki/File:US_Navy_090910-N-8420M-038_Students_in_Basic_Underwater_Demolition-SEAL_(BUD-S)_class_279_participate_in_a_surf_passage_exercise_during_the_first_phase_of_training_at_Naval_Amphibious_Base_Coronado.jpg [5] https://commons.wikimedia.org/wiki/File:P0772931871(37827)(NRCS_Photo_Gallery).jpg [6] https://commons.wikimedia.org/wiki/File:NRCSMT01082(18769)(NRCS_Photo_Gallery).jpg -- Jonas Öberg, Founder & Shuttleworth Foundation Fellow Commons Machinery | [email protected] E-mail is the fastest way to my attention _______________________________________________ Commons-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/commons-l
