2009/4/8 Frankie Roberto <[email protected]>: > > I'd be fascinated to know how a factor analysis works (I tried looking at > http://en.wikipedia.org/wiki/Factor_analysis, but it's not the most > accessible Wikipedia page).
No, its awful. I'm using the term a bit generically but its quite simple. Eg, imagine there are N people who have voted on pictures. Now take an N dimensional graph and plot where they rate them all (or how they compare them all). Each picture is a point in this N-dimensional space. Now we have an utterly incomprehensible graph which is also hard to visualise to those of us who find thinking in more dimensions than we have toes difficult. So, what would be great is to somehow reduce that number of dimensions a bit, or even a lot. That amounts to finding a few factors that explain most of the data. How you do this, like much of stats, depends. There are lots and lots of algorithms for it. Some are easy - roughly corresponding to projecting the N-dimensional space down onto some subspace that's more manageable, so all you have to do is find the subspace. But there's no reason to assume that everything is linear, so you might do something more sophisticated. > > Another alternative might be to force people to make a binary choice between > "scenic" and "not scenic", or perhaps a 4 way choice with 2 "very" options. > Then you avoid all the indecisive 4-6 responses. > If what you want is a *lot* of data comparisons fast then use something like Maxdiff: http://en.wikipedia.org/wiki/MaxDiff Show four photos and ask for best and worst. That's still amazingly easy (almost as easy as the kittenwar game) and you get a lot more ratings done. But, beware! In this case there's a whole nother issue. So far we have been considering: - finding scenic places on the basis of some mass voting (a million people can't be wrong) - finding places I'd like (needs a factor analysis or something similar) But the scenes have location data too. You might want to say here -> is an really good place to go because there is a cluster of scenically rated photos from there. That requires a whole lot more sophisticated analysis again. However I don't know what the use cases of this data might be, so can't comment. I'm not saying Tom et al. are wrong because they know what their constraints and aims are which I most emphatically do not. What's more they have almost certainly taken the advice of statisticians to get this just right, so my rather amateurish criticism is meant to be just that, my half pennyworth. When I get stuck, I tend to go off and talk to a fellow of the royal statistical society. It tends to unstick my mind, though I usually come away realising how much more problematic everything really is 8-). -- Francis Davey _______________________________________________ Mailing list [email protected] Archive, settings, or unsubscribe: https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public
