Re: [mySociety:public] New project: ScenicOrNot

Francis Davey Wed, 08 Apr 2009 03:21:46 -0700

2009/4/8 Frankie Roberto <[email protected]>:
>
> I'd be fascinated to know how a factor analysis works (I tried looking at
> http://en.wikipedia.org/wiki/Factor_analysis, but it's not the most
> accessible Wikipedia page).


No, its awful.

I'm using the term a bit generically but its quite simple.

Eg, imagine there are N people who have voted on pictures. Now take an
N dimensional graph and plot where they rate them all (or how they
compare them all). Each picture is a point in this N-dimensional
space.

Now we have an utterly incomprehensible graph which is also hard to
visualise to those of us who find thinking in more dimensions than we
have toes difficult.

So, what would be great is to somehow reduce that number of dimensions
a bit, or even a lot. That amounts to finding a few factors that
explain most of the data.

How you do this, like much of stats, depends. There are lots and lots
of algorithms for it. Some are easy - roughly corresponding to
projecting the N-dimensional space down onto some subspace that's more
manageable, so all you have to do is find the subspace. But there's no
reason to assume that everything is linear, so you might do something
more sophisticated.

>
> Another alternative might be to force people to make a binary choice between
> "scenic" and "not scenic", or perhaps a 4 way choice with 2 "very" options.
> Then you avoid all the indecisive 4-6 responses.
>

If what you want is a *lot* of data comparisons fast then use
something like Maxdiff:

http://en.wikipedia.org/wiki/MaxDiff

Show four photos and ask for best and worst. That's still amazingly
easy (almost as easy as the kittenwar game) and you get a lot more
ratings done.

But, beware! In this case there's a whole nother issue. So far we have
been considering:

- finding scenic places on the basis of some mass voting (a million
people can't be wrong)
- finding places I'd like (needs a factor analysis or something similar)

But the scenes have location data too. You might want to say here ->
is an really good place to go because there is a cluster of scenically
rated photos from there. That requires a whole lot more sophisticated
analysis again.

However I don't know what the use cases of this data might be, so
can't comment. I'm not saying Tom et al. are wrong because they know
what their constraints and aims are which I most emphatically do not.
What's more they have almost certainly taken the advice of
statisticians to get this just right, so my rather amateurish
criticism is meant to be just that, my half pennyworth.

When I get stuck, I tend to go off and talk to a fellow of the royal
statistical society. It tends to unstick my mind, though I usually
come away realising how much more problematic everything really is
8-).

--
Francis Davey

_______________________________________________
Mailing list [email protected]
Archive, settings, or unsubscribe:
https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public

Re: [mySociety:public] New project: ScenicOrNot

Reply via email to