Gnangarra, Nemo: good points. I pinged Tineye on twitter; let's see what they say. Focusing on the obvious (people-images, finding out how fast our own collection is included in tineye) is a nice place to start, this makes a solid GSoC-sized project.
SJ On Fri, Feb 7, 2014 at 5:23 AM, Gnangarra <[email protected]> wrote: > While any subject can be a copyright violation I find that people images are > the most frequent offenders, especially those that are less than 1000px on > the longest edge. so a rough to that range(if possible) would reduce the > volumes needing to be processed > > > On 7 February 2014 17:17, Federico Leva (Nemo) <[email protected]> wrote: >> >> Samuel Klein, 06/02/2014 23:39: >> >>> Are we doing any commons analysis like this at the moment? >>> Is any similarity-analysis done on upload to help uploaders identify >>> copies of the same image that already exist online? Or to flag >>> potential copyvios for reviewers? >>> >>> I'm sure TinEye would be glad to give us high-volume API access to >>> enable that sort of cross-referencing. >> >> >> Would they? It's something we really need a lot and that we should do for >> all uploads everywhere to save our patrollers a lot of precious time, but it >> always looked impossible. >> 1) If WMF is interested in helping it would be useful to know. Even >> getting access to the existing search API key is a quest no hero is known to >> have successfully completed despite repeated attempts. >> <https://wikitech.wikimedia.org/wiki/Web_search> If it's possible to avoid >> institutional bottlenecks completely that would also be useful to know. >> 2) We don't even know what percentage of Wikimedia Commons images are >> included in TinEye and at what speed. Does someone manage to extract this >> information from them? >> >> As Fae says, good part of the work is integrating the results in the >> patrollers' (and uploaders'?) workflow in a sensible way. Embedding it in >> UploadWizard may be too much, but a "simple" bot which just places a tag on >> suspicious images can be made into an extension too, if preferred to a mere >> pywikibot script. >> If the two premises above are positive, it should be included in >> <https://www.mediawiki.org/wiki/Mentorship_programs/Possible_projects#Wikimedia_Commons_.2F_multimedia>: >> GSoC is approaching! >> >> Nemo >> >> >> _______________________________________________ >> Commons-l mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/commons-l > > > > _______________________________________________ > Commons-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/commons-l > -- Samuel Klein @metasj w:user:sj +1 617 529 4266 _______________________________________________ Commons-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/commons-l
