On Mon, Sep 26, 2011 at 6:43 PM, Paul Houle <[email protected]> wrote:
> ** > I've made some attempt to map images on Wikimedia commons to > distinct concepts from DBpedia, see > > http://ookaboo.com/ > > This could be useful for forming a training set, but I haven't yet > got around to releasing a public dump of the data. I have about 1 million > things classified and could certainly extend the strategies used to get > more. > > Unless there's been a really unprecedented breakthrough, I'd think > that the application of machine vision to Wikimedia faces the problem of > getting enough training data. If you had thousands or tens of thousands of > photos that were labeled 'cat' or 'not cat', or 'member of plant species X' > or 'not member of plant species X', you can train a classifier to make the > distinction. However, if you've got two or three bad photos of a > particular plant (which is what you have most of the times in Commons) you > don't have enough training data to generalize. > > If you've got a specific mission, say genitals recognition, I think > you can make progress, but to attack the general problem you need to go big > with your training sets. > Every small category is a part of a big category. A system such as this will not be able to specify plant species, but it might well be able to find pictures of plants. If it then gives a list of plant pictures that are not in some plant category, animal pictures that are not in animal category, buildings that are not in a regional building category, maps that are not in a map of category, paintings that are not in a painter category, famous people that are not in a people category etcetera, it could deliver those to volunteers to further classify. -- André Engels, [email protected]
_______________________________________________ Commons-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/commons-l
