Re: [Commons-l] Data mining for media archives

Samuel Klein Fri, 07 Feb 2014 12:50:10 -0800

Gnangarra, Nemo: good points.  I pinged Tineye on twitter; let's see
what they say. Focusing on the obvious (people-images, finding out how
fast our own collection is included in tineye) is a nice place to
start, this makes a solid GSoC-sized project.


SJ

On Fri, Feb 7, 2014 at 5:23 AM, Gnangarra <[email protected]> wrote:
> While any subject can be a copyright violation I find that people images are
> the most frequent offenders, especially those that are less than 1000px on
> the longest edge. so a rough to that range(if possible) would reduce the
> volumes needing to be processed
>
>
> On 7 February 2014 17:17, Federico Leva (Nemo) <[email protected]> wrote:
>>
>> Samuel Klein, 06/02/2014 23:39:
>>
>>> Are we doing any commons analysis like this at the moment?
>>> Is any similarity-analysis done on upload to help uploaders identify
>>> copies of the same image that already exist online?  Or to flag
>>> potential copyvios for reviewers?
>>>
>>> I'm sure TinEye would be glad to give us high-volume API access to
>>> enable that sort of cross-referencing.
>>
>>
>> Would they? It's something we really need a lot and that we should do for
>> all uploads everywhere to save our patrollers a lot of precious time, but it
>> always looked impossible.
>> 1) If WMF is interested in helping it would be useful to know. Even
>> getting access to the existing search API key is a quest no hero is known to
>> have successfully completed despite repeated attempts.
>> <https://wikitech.wikimedia.org/wiki/Web_search> If it's possible to avoid
>> institutional bottlenecks completely that would also be useful to know.
>> 2) We don't even know what percentage of Wikimedia Commons images are
>> included in TinEye and at what speed. Does someone manage to extract this
>> information from them?
>>
>> As Fae says, good part of the work is integrating the results in the
>> patrollers' (and uploaders'?) workflow in a sensible way. Embedding it in
>> UploadWizard may be too much, but a "simple" bot which just places a tag on
>> suspicious images can be made into an extension too, if preferred to a mere
>> pywikibot script.
>> If the two premises above are positive, it should be included in
>> <https://www.mediawiki.org/wiki/Mentorship_programs/Possible_projects#Wikimedia_Commons_.2F_multimedia>:
>> GSoC is approaching!
>>
>> Nemo
>>
>>
>> _______________________________________________
>> Commons-l mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/commons-l
>
>
>
> _______________________________________________
> Commons-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/commons-l
>



-- 
Samuel Klein          @metasj           w:user:sj          +1 617 529 4266

_______________________________________________
Commons-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/commons-l

Re: [Commons-l] Data mining for media archives

Reply via email to