Hi Andrea, I have the precise same problem of having photographs and trying to extract meaning from the clusters. I've been working on code to scratch this itch, and I'd be happy to send it to anyone, or to work with someone else to generalize the solution. The code is in Perl.
I also have track logs for all of the points where the photos were taken. So I know when I was near to each photo (both when it was initially taken, plus subsequent visits to the area). I also have a collection of waypoints for the general area. And finally, I have a collection of travel ephemera like ticket stubs and receipts, that all have time stamps and which I've been geocoding based on the track logs. I wrote some perl code to show the waypoints that are closest to each photo, and then to show the photos that are close to each waypoint. And then to show when I was 'near' to each waypoint and to each photograph. I realized during this process that in many (but not all!) cases that the nearest waypoint(s) to a picture made a pretty good tag for that picture. The pictures where 'Vodensky' was the nearest waypoint were, sure enough, best tagged as 'Vodensky' for the Vodensky Military Museum. And the pictures closest to the waypoint 'ASIEN-GIRLS' were in fact of the strip club that featured 'Asien Girls.' This technique serves to create clusters of a sort. But in this case the waypoints manually define the cluster centers...which is very effective, but is sort of cheating :-) Schuyler wrote some cool clustering code for Google Maps Hacks. Here is an example of the code in action: http://mappinghacks.com/projects/gmaps/cluster.html You click on markers with the black dots to zoom in on a cluster. You click on a marker without a black dot to see information on an individual point. This shows most of my personal waypoints in a clever sort of clustering. He started with a K-means clustering (http://en.wikipedia.org/wiki/K_means) and then futzed with it a bit in order to make it look better. This is not the only, or maybe even best, clustering algorithm-but analyzing the strengths of various clustering algorithms is beyond my current abilities and interests. Here is the same code used to show a cluster of my recent pictures: http://mappinghacks.com/projects/gmaps/cluster_pix.html In theory you can click on the markers without black dots and an info box pops up with a thumbnail, and you can click on the thumbnail to see all the pictures. This works except a) the thumbnails are not pre loaded, so it can take a bit to load and more importantly b) there are many 'terminal' clusters. By which I mean, clusters which can't be zoomed in more because the map is at maximum zoom. I should do something about that, like show a list of points when they can't be zoomed further, or some such. Some day :-) The gazeteer lookup is either fairly easy, or a pita. The challenge is in having a meaningful gazeteer. The US Geographic Names Information Service data is great: http://geonames.usgs.gov/ (there are also links there to the GEOnet names server for international names). The problem with the GNIS is that they have nearly two million names for the continental US. An embarrassment of riches. You could get pretty good data by using the populated place 'class,' but even there you end up with lots of supposed populated place names which don't always match the names that people actually use (this 'problem' might just be my personal lack of sensitivity/awareness to the historical aspects of my location...). Using the 'park' class works well. Getting regions and neighborhoods is a different challenge! What is a region or a neighborhood? To some extent a 'region' could have an objective location, but mostly regions and neighborhoods are social constructs. As such they have fuzzy boundaries and they vary (sometimes greatly!) with time. This wikipedia definition of SoMa in San Francisco is an example of an erroneous attempt at definition: http://en.wikipedia.org/wiki/South_of_Market%2C_San_Francisco%2C_California "The eastern edge along the Embarcadero and south-eastern corner of this area (where Mission Creek meets the bay) is known as South Beach, a separate neighborhood, and the border below Townsend Street begins Mission Bay. The north-eastern corner (where Market Street meets the bay) is often considered part of the Financial District." What 'border' below Townsend? and 'often considered?' This is an _attempt_ to quantify a social construction... The Neighborhood Project was/is an attempt at exploring the boundaries of neighborhood based on what people believe. http://hood.theory.org/ They used the 'Bloggy' algorithm...and they have materials that talk about that on their site. It is possible that you could use metaballs/blobby objects rather than true clustering for your photos. I love this description (from: http://www.siggraph.org/education/materials/HyperGraph/modeling/metaballs/metaballs_mward.html): " We can think of a metaball as a partical surrounded by a density field, where the density attributed to the particle (its influence) decreases with distance from the particle location. A surface is implied by taking an isosurface through this density field - the higher the isosurface value, the nearer it will be to the particle. The powerful aspect of metaballs is the way they can be combined." If you really want the 'clusters' but you are willing to ignore the outliers, you could 'metaball' your photos, and then assume that the clusters are where you have contiguous areas. Or you could use the centers of those 'clusters' as the initial centroids for a K-means algorithm (that _seems_ like actually a pretty good idea). I've been working on some code to implement a geographical data store that would sort of intrinsically allow for the creation of user defined 'areas' or 'regions' or 'neighborhoods.' Anyway...I am very interested in these thoughts. I'd love to collaborate with you, or anyone else, on these areas. I'm currently working in Perl and Ruby on Rails using MySQL and Postgis. Cheers, Rich On 9/9/06, Andrea Moed <[EMAIL PROTECTED]> wrote:
Thanks for the responses! To clarify, as I probably should have to begin with: I have an existing collection of lat/lons, each representing a place where a photo was taken. I want to computationally find the geographic clusters in this collection, i.e. the geographic areas with the densest concentrations of points. (So it sounds like Andrew's "location-closeness clustering" is what I'm thinking of.) Having found these most-photographed areas, I want to find the geographic name that best describes each area, such as a region, city, neighborhood or park name. So, I'm looking for two different things, a location-closeness clustering algorithm and a gazetteer lookup. Sorry to be confusing. best, --andrea On 9/8/06, Andrew Turner <[EMAIL PROTECTED]> wrote: > Are you more interested in the "plot points" software, or the > "grouping algorithms"? What kind of clustering are you doing? > - Based on points users all tagged and therefore creating clusters of > "you like *this*, so you'll probably like *these*"? > - Location-closeness clustering > - Keyword terms based on generic names the points may have? > > There are various algorithm/packages for any of the problems above. > And for creating points it would be easy to either use a current > service (of which there are many) and then apply your algorithms to > that (as Mike was suggesting), or roll your own using pieces of > existing mapping/archiving software packages available. > > Andrew > > > On 9/8/06, Mike Liebhold <[EMAIL PROTECTED]> wrote: > > > > Hi andrea, > > > > you can record and map points easily with apps like platial.com by naming > > points with any plain language tag of your choice. > > > > cheers~ > > > > -mike liebhold > > > > Andrea Moed wrote: > > > > For a webapp I'm hoping to build, I want to look at a collection of geo > > points, discover clusters of points and assign the clusters useful > > geographic names (where "useful" = good web search term). Does anyone know > > of freely available code for doing this? > > thanks much, > > --andrea > > > > ________________________________ > > > > _______________________________________________ > > Geowanking mailing list > > [email protected] > > http://lists.burri.to/mailman/listinfo/geowanking > > > > > > _______________________________________________ > > Geowanking mailing list > > [email protected] > > http://lists.burri.to/mailman/listinfo/geowanking > > > > > > > > > -- > Andrew Turner > [EMAIL PROTECTED] 42.4266N x 83.4931W > http://highearthorbit.com Northville, Michigan, USA > _______________________________________________ > Geowanking mailing list > [email protected] > http://lists.burri.to/mailman/listinfo/geowanking > _______________________________________________ Geowanking mailing list [email protected] http://lists.burri.to/mailman/listinfo/geowanking
-- Rich Gibson Chief Scientist, Locative Technologies http://mappinghacks.com http://geocoder.us http://testingrange.com AIM period3equals _______________________________________________ Geowanking mailing list [email protected] http://lists.burri.to/mailman/listinfo/geowanking
