Re: Re: [Geowanking] Discovering and naming clusters of geopoints?

Rich Gibson Sat, 09 Sep 2006 11:01:46 -0700

Hi Andrea,

I have the precise same problem of having photographs and trying to
extract meaning from the clusters.  I've been working on code to
scratch this itch, and I'd be happy to send it to anyone, or to work
with someone else to generalize the solution.  The code is in Perl.

I also have track logs for all of the points where the photos were
taken. So I know when I was near to each photo (both when it was
initially taken, plus subsequent visits to the area). I also have a
collection of waypoints for the general area. And finally, I have a
collection of travel ephemera like ticket stubs and receipts, that all
have time stamps and which I've been geocoding based on the track
logs.

I wrote some perl code to show the waypoints that are closest to each
photo, and then to show the photos that are close to each waypoint.
And then to show when I was 'near' to each waypoint and to each
photograph.

I realized during this process that in many (but not all!) cases that
the nearest waypoint(s) to a picture made a pretty good tag for that
picture.

The pictures where 'Vodensky' was the nearest waypoint were, sure
enough, best tagged as 'Vodensky' for the Vodensky Military Museum.
And the pictures closest to the waypoint 'ASIEN-GIRLS' were in fact of
the strip club that featured 'Asien Girls.'

This technique serves to create clusters of a sort. But in this case
the waypoints manually define the cluster centers...which is very
effective, but is sort of cheating :-)

Schuyler wrote some cool clustering code for Google Maps Hacks. Here
is an example of the code in action:
http://mappinghacks.com/projects/gmaps/cluster.html

You click on markers with the black dots to zoom in on a cluster. You
click on a marker without a black dot to see information on an
individual point.

This shows most of my personal waypoints in a clever sort of
clustering. He started with a K-means clustering
(http://en.wikipedia.org/wiki/K_means) and then futzed with it a bit
in order to make it look better.

This is not the only, or maybe even best, clustering algorithm-but
analyzing the strengths of various clustering algorithms is beyond my
current abilities and interests.

Here is the same code used to show a cluster of my recent pictures:
http://mappinghacks.com/projects/gmaps/cluster_pix.html

In theory you can click on the markers without black dots and an info
box pops up with a thumbnail, and you can click on the thumbnail to
see all the pictures. This works except a) the thumbnails are not pre
loaded, so it can take a bit to load and more importantly b) there are
many 'terminal' clusters. By which I mean, clusters which can't be
zoomed in more because the map is at maximum zoom. I should do
something about that, like show a list of points when they can't be
zoomed further, or some such. Some day :-)

The gazeteer lookup is either fairly easy, or a pita. The challenge
is in having a meaningful gazeteer. The US Geographic Names
Information Service data is great:
http://geonames.usgs.gov/ (there are also links there to the GEOnet
names server for international names).

The problem with the GNIS is that they have nearly two million names
for the continental US. An embarrassment of riches. You could get
pretty good data by using the populated place 'class,' but even there
you end up with lots of supposed populated place names which don't
always match the names that people actually use (this 'problem' might
just be my personal lack of sensitivity/awareness to the historical
aspects of my location...). Using the 'park' class works well.

Getting regions and neighborhoods is a different challenge! What is
a region or a neighborhood? To some extent a 'region' could have an
objective location, but mostly regions and neighborhoods are social
constructs. As such they have fuzzy boundaries and they vary
(sometimes greatly!) with time.

This wikipedia definition of SoMa in San Francisco is an example of an
erroneous attempt at definition:

http://en.wikipedia.org/wiki/South_of_Market%2C_San_Francisco%2C_California

"The eastern edge along the Embarcadero and south-eastern corner of
this area (where Mission Creek meets the bay) is known as South Beach,
a separate neighborhood, and the border below Townsend Street begins
Mission Bay. The north-eastern corner (where Market Street meets the
bay) is often considered part of the Financial District."

What 'border' below Townsend? and 'often considered?' This is an
_attempt_ to quantify a social construction...

The Neighborhood Project was/is an attempt at exploring the boundaries
of neighborhood based on what people believe. http://hood.theory.org/

They used the 'Bloggy' algorithm...and they have materials that talk
about that on their site. It is possible that you could use
metaballs/blobby objects rather than true clustering for your photos.

I love this description (from:
http://www.siggraph.org/education/materials/HyperGraph/modeling/metaballs/metaballs_mward.html):

" We can think of a metaball as a partical surrounded by a density
field, where the density attributed to the particle (its influence)
decreases with distance from the particle location. A surface is
implied by taking an isosurface through this density field - the
higher the isosurface value, the nearer it will be to the particle.
The powerful aspect of metaballs is the way they can be combined."

If you really want the 'clusters' but you are willing to ignore the
outliers, you could 'metaball' your photos, and then assume that the
clusters are where you have contiguous areas. Or you could use the
centers of those 'clusters' as the initial centroids for a K-means
algorithm (that _seems_ like actually a pretty good idea).

I've been working on some code to implement a geographical data store
that would sort of intrinsically allow for the creation of user
defined 'areas' or 'regions' or 'neighborhoods.'

Anyway...I am very interested in these thoughts. I'd love to
collaborate with you, or anyone else, on these areas. I'm currently
working in Perl and Ruby on Rails using MySQL and Postgis.

Cheers,
Rich

On 9/9/06, Andrea Moed <[EMAIL PROTECTED]> wrote:

Thanks for the responses! To clarify, as I probably should have to begin
with:
I have an existing collection of lat/lons, each representing a place where a
photo was taken. I want to computationally find the geographic clusters in
this collection, i.e. the geographic areas with the densest concentrations
of points. (So it sounds like Andrew's "location-closeness clustering" is
what I'm thinking of.) Having found these most-photographed areas, I want to
find the geographic name that best describes each area, such as a region,
city, neighborhood or park name. So, I'm looking for two different things, a
location-closeness clustering algorithm and a gazetteer lookup. Sorry to be
confusing.
best,
--andrea


On 9/8/06, Andrew Turner <[EMAIL PROTECTED]> wrote:
> Are you more interested in the "plot points" software, or the
> "grouping algorithms"? What kind of clustering are you doing?
> - Based on points users all tagged and therefore creating clusters of
> "you like *this*, so you'll probably like *these*"?
> - Location-closeness clustering
> - Keyword terms based on generic names the points may have?
>
> There are various algorithm/packages for any of the problems above.
> And for creating points it would be easy to either use a current
> service (of which there are many) and then apply your algorithms to
> that (as Mike was suggesting), or roll your own using pieces of
> existing mapping/archiving software packages available.
>
> Andrew
>
>
> On 9/8/06, Mike Liebhold <[EMAIL PROTECTED]> wrote:
> >
> >  Hi andrea,
> >
> >  you can  record and map points easily with apps like platial.com  by
naming
> > points with any plain language tag of your choice.
> >
> >  cheers~
> >
> >  -mike liebhold
> >
> >  Andrea Moed wrote:
> >
> > For a webapp I'm hoping to build, I want to look at a collection of geo
> > points, discover clusters of points and assign the clusters useful
> > geographic names (where "useful" = good web search term). Does anyone
know
> > of freely available code for doing this?
> >  thanks much,
> >  --andrea
> >
> > ________________________________
> >
> > _______________________________________________
> > Geowanking mailing list
> > [email protected]
> > http://lists.burri.to/mailman/listinfo/geowanking
> >
> >
> > _______________________________________________
> > Geowanking mailing list
> > [email protected]
> > http://lists.burri.to/mailman/listinfo/geowanking
> >
> >
> >
>
>
> --
> Andrew Turner
> [EMAIL PROTECTED]        42.4266N x 83.4931W
> http://highearthorbit.com               Northville, Michigan, USA
> _______________________________________________
> Geowanking mailing list
> [email protected]
> http://lists.burri.to/mailman/listinfo/geowanking
>


_______________________________________________
Geowanking mailing list
[email protected]
http://lists.burri.to/mailman/listinfo/geowanking



--
Rich Gibson
Chief Scientist, Locative Technologies
http://mappinghacks.com
http://geocoder.us
http://testingrange.com
AIM period3equals
_______________________________________________
Geowanking mailing list
[email protected]
http://lists.burri.to/mailman/listinfo/geowanking

Re: Re: [Geowanking] Discovering and naming clusters of geopoints?

Reply via email to