Re: [Commons-l] Automated identification of images on commons

Andre Engels Mon, 26 Sep 2011 10:33:28 -0700

On Mon, Sep 26, 2011 at 6:43 PM, Paul Houle <[email protected]> wrote:


> **
>        I've made some attempt to map images on Wikimedia commons to
> distinct concepts from DBpedia,  see
>
> http://ookaboo.com/
>
>       This could be useful for forming a training set,  but I haven't yet
> got around to releasing a public dump of the data.  I have about 1 million
> things classified and could certainly extend the strategies used to get
> more.
>
>       Unless there's been a really unprecedented breakthrough,  I'd think
> that the application of machine vision to Wikimedia faces the problem of
> getting enough training data.  If you had thousands or tens of thousands of
> photos that were labeled 'cat' or 'not cat',  or 'member of plant species X'
> or 'not member of plant species X',  you can train a classifier to make the
> distinction.  However,  if you've got two or three bad photos of a
> particular plant (which is what you have most of the times in Commons) you
> don't have enough training data to generalize.
>
>       If you've got a specific mission,  say genitals recognition, I think
> you can make progress,  but to attack the general problem you need to go big
> with your training sets.
>

Every small category is a part of a big category. A system such as this will
not be able to specify plant species, but it might well be able to find
pictures of plants. If it then gives a list of plant pictures that are not
in some plant category, animal pictures that are not in animal category,
buildings that are not in a regional building category, maps that are not in
a map of category, paintings that are not in a painter category, famous
people that are not in a people category etcetera, it could deliver those to
volunteers to further classify.

-- 
André Engels, [email protected]

_______________________________________________
Commons-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/commons-l

Re: [Commons-l] Automated identification of images on commons

Reply via email to