[Note: cross-posting back to gae for java to pick up an earlier thread, further discussions should remain on the main app engine group]
Brian, After Nick's comments I dove into finding out more about this and came across your Arc2Earth Cloud services and some comments you made about "quadtrees packed with rtrees", which I would love to know more about. So, yes, I have started looking. And in the process have realized that you and others have laid significant groundwork that I have to catch up on. In terms of getting entities out of the datastore efficiently, I have a plan that involves a combination of intersection ops (via JTS) and set membership (via GQL). Tell me if I am off my rocker here, and keep in mind that I my app is not a general purpose GIS, so there is some tradeoff (as has been eluded to elsewhere) in terms of functionality/flexibility. The intent of my app is display parcel centroids as points and include various attributes (addr, etc). The reference app is pretty basic, but I am using it as a JTS sandbox for another app I am building, which has to do with parcel-level earthquake loss estimation for Salt Lake County. The analysis was offline but I am targetting AppEngine for display etc. My entities are: parcels, census blocks, census block groups, tracts, counties, states. My basic use case is: Find your house in Salt Lake County (via zoom or search using geocoding via the Maps API). Click on the marker over your house and discover the losses incurred in an M7 earthquake. Pretty basic stuff. Except there are ~240,000 parcels in the datastore. Conceptual implementation: Pass the map bounding box to the server, ask for the parcel points that are within the map bounds. Easy as pie in PostGIS because of the embedded spatial functions in the SQL and the built in spatial indexing. Not so easy in App Engine with GQL. Possible solutions: - Geohash http://labs.metacarta.com/blog/27.entry/geographic-queries-on-google-app-engine- problems with accuracy. - Geocells http://code.google.com/p/geomodel - Python, limited spatial queries, has same problem as next solution - Precomputing bounding boxes for proximity search http://code.google.com/appengine/articles/geosearch.html - Python, pre-computed grids work for "what else is near where I clicked" but intersecting the map viewport (bounds) with geometry from the datastore still requires more work. While I would not choose this approach, I agree with Brett Slatkin's explanation of pre-computed set membership and list properties -- http://code.google.com/events/io/sessions/BuildingScalableComplexApps.html Proposed solution with JTS/GQL - Design data structures with set membership in mind and use GQL for queries. In my case, parcels belong in census blocks, which belong in block groups, which belong in tracts, which belong in counties, which belong in states. This structure is a natural for set membership. I can do this two ways: either a parcel knows what census block belongs to (and so on up the tree), or a state has a list of counties that belong to it (and so on down the tree). Each nested set reduces the maximum number of entities I need to query, but has a few tradeoffs (more below). - Incorporate OGC's Well Known Text (WKT) as the geom property for each entity and use JTS for spatial queries. Point-in-polygon is a good fit here. So are distance calculations. Implementation details: There are 29 counties in Utah. That's a pretty fast query in GQL, so grab all 29. Each of them come out of the datastore with their own WKT representing the geometry. Loop over the 29 counties, use JTS to create polygons for each of them (this is why AppEngine has distributed CPU, right?) and test my incoming point (where the user clicked on the map) to find which county they clicked in. Now make a second GQL query based on either set membership of tracts in the county (the county has a list of tracts that belong to it) or by asking for all tracts that have the county name as a property. Both of these properties would be pre-computed. I think I favor the set membership approach, because I believe it reduces the table scan. With my list of tracts in a county (eg Salt Lake County has 209), use GQL to ask for the tracts explicitly. AppEngine should not have any problem retrieving 209 results efficiently. With each of the 209 tracts comes their own WKT. Loop over the list and create polygons with JTS. Ask JTS which tract my clicked point is in. Now I know the tract I am in. Repeat this process with a third GQL query for census block groups, combined with a third JTS loop over those for another point-in-polygon query. Repeat again for census blocks, at which point we can grab all the parcels for that census block (by set membership only) and return their point coordinates across the wire for display in the client. There will likely also be a fixup for times when the map's bounding box actually encompasses two (or more) census blocks. Note: I am only sending back parcel point coords to the client, which knows nothing of tracts, blockgroups, or blocks. Those are only used to make GQL queries return less than 1000 results. Practically guaranteed manageable chunks of data in App Engine courtesy of the Census Bureau. Tradeoffs: Obviously, the major trade-off is that with every drag of the map at the appropriate zoom level, we do 4+ GQL set membership queries and make JTS create a few hundred polygons and test them for intersections. With AppEngine it is still fast, but my app quotas may be gobbled up quickly. More testing is required here with the full data set and the app under load. On the other hand, implementing spatial indexes in App Engine is not trivial (as you well know). Any direction you can provide is surely welcome. Have you created QuadTree and RTree as entity types ("kinds")? I imagine if I do this either offline or on the first pass and conditionally thereafter (which may be slower) then I might get away without having to repeatedly hit the datastore or spike the CPU with JTS every time. Thoughts, comments, money for research: all are welcome ;) Stuart On Fri, Jul 10, 2009 at 11:56 AM, bFlood <[email protected]> wrote: > > hi stuart > > great work getting JTS to run on GAE. > > Have you started to look at how you would port those JTS indexes to > the GAE datastore? for me (and others), the hard part with geospatial > queries in GAE isn't the geometry intersection ops (although JTS is a > welcome addition for this), its how you efficiently get entities out > of the datastore. (I think thats what Nick might be hinting at above). > I'm not sure the memory resident JTS indexes are going to work (even > for small datasets in memcache), so those existing JTS structures will > need to be re-written with the "rules of GAE" in mind, which quickly > leads to the problem above > > cheers and good luck with it > brian > > On Jul 10, 12:53 pm, Stuart Moffatt <[email protected]> wrote: > > Nick, > > > > More information on JTS indexes: > > > > 1) STRTree - a packed R-Tree that does not support insert/delete once > built > > 2) QuadTree - slower than STRTree but supports insert/delete > > > > http://jump-pilot.sourceforge.net/pdfs/jts_secrets_foss4g2007.pdf > > > > Stuart > > > > On Fri, Jul 10, 2009 at 10:39 AM, Stuart Moffatt < > [email protected]>wrote: > > > > > > > > > Nick, > > > > > Correct, as in, not yet. The reference app just parses well-known text > on > > > the fly to create geometry (server side only), and tests that geometry > > > (which is usually complex) against incoming coordinates (so far only > points > > > and map bounds). Next stop, figuring out the right kind of index and > its > > > implementation. > > > > > If you care to elaborate on the comment you made with regard to R-trees > or > > > space-filling curve algorithms, I'm open to implementation ideas. > > > > > Stuart > > > > > On Fri, Jul 10, 2009 at 10:32 AM, Nick Johnson (Google) < > > > [email protected]> wrote: > > > > >> Hi Stuart, > > > > >> Am I correct in inferring that you're not storing geospatial indexes > > >> persistently at all, then? This seems impractical for the App Engine > > >> architecture, and for the scale it is intended to support. > > > > >> -Nick Johnson > > > > >> On Fri, Jul 10, 2009 at 5:29 PM, Stuart Moffatt< > [email protected]> > > >> wrote: > > >> > Nick, > > > > >> > First, there was a typo in my announcement that I just realized. The > > >> second > > >> > last paragraph should state that the "current version of the > application > > >> > does NOT demonstrate use of the datastore", that is, I am still in > the > > >> > process of implementing fetches of WKT from my entities, but that's > the > > >> > plan. > > > > >> > As to your questions: > > > > >> > 1) JTS indexes. From their technical spec: "There is a large > literature > > >> of > > >> > efficient algorithms for intersection detection. Unfortunately, many > of > > >> them > > >> > involve substantial code complexity. JTS tries to balance code > > >> simplicity > > >> > with performance gains. It uses some special techniques to produce > > >> > substantial performance gains for common types of input data. These > > >> > techniques include in-memory spatial indexes of various types, and > > >> > sophisticated methods for structuring data such as the technique of > > >> Monotone > > >> > Chains." (See > > >> >http://www.vividsolutions.com/jts/bin/JTS%20Technical%20Specs.pdf). > > > > >> > 2) I am using brute force (ie CPU) to do point-in-polygon and > > >> > polygon-in-polygon in this version of the app, so I am letting JTS > worry > > >> > about managing its own in-memory indexes. Nothing fancy built on top > of > > >> > AppEngine (yet), but I will be looking at App Engine indexes and > perhaps > > >> > blobs. > > > > >> > Stuart > > > > >> > On Fri, Jul 10, 2009 at 7:21 AM, Nick Johnson (Google) > > >> > <[email protected]> wrote: > > > > >> >> Hi Stuart, > > > > >> >> Very nice! > > > > >> >> Out of curiosity, what indexing scheme does JTS use? Are you > building > > >> >> an R-Tree or other tree based structure on top of the datastore, or > > >> >> are you using a hilbert-curve type approach? > > > > >> >> -Nick Johnson > > > > >> >> On Fri, Jul 10, 2009 at 8:03 AM, Stuart Moffatt< > > >> [email protected]> > > >> >> wrote: > > > > >> >> > Announcement: > > > > >> >> > The GIS in the Cloud (App Engine + JTS) reference project is now > > >> >> > available athttp://giscloud.appspot.comwith the source code > > >> >> > available athttp://giscloud.googlecode.com(documentation > > >> >> > forthcoming). > > > > >> >> > While many GIS/spatial App Engine discussions center on the > python > > >> SDK > > >> >> > (with pre-computing grids or geocells and heavy lifting done with > set > > >> >> > membership in the datastore), this reference project uses the > Java > > >> >> > Topology Suite (http://www.vividsolutions.com/jts/jtshome.htm) > > >> within > > >> >> > the App Engine server to demonstrate point-in-polygon and > polygon-in- > > >> >> > polygon spatial queries. > > > > >> >> > The GIS in the Cloud app is pure Java and was built with > > > > >> >> > * Google App Engine SDK for Java > > >> >> > * Google Plugin for Eclipse > > >> >> > * Google Web Toolkit > > >> >> > * Google Web Toolkit API Libraries - Google Maps 1.0 Library > > >> >> > * Java Topology Suite > > > > >> >> > The Java Topology Suite was ported to C++ and became GEOS, which > was > > >> >> > embedded in PostgreSQL to become PostGIS, allowing users access > to > > >> >> > spatial functions within SQL. While App Engine does not give us > > >> >> > spatial functions in GQL, the GIS in the Cloud app will > demonstrate a > > >> >> > GQL/JTS combination to accomplish the same effect. > > > > >> >> > The current version of the application does demonstrate use of > the > > >> >> > datastore (TODO), but it does employ JTS to query > point-in-polygon > > >> and > > >> >> > polygon-in-polygon. Single clicks on the map trigger the > point-in- > > >> >> > polygon test (the point is the map center and the polygon is Salt > > >> Lake > > >> >> > County). With zoom or drag, the polygon-in-polygon test is > triggered. > > >> >> > The first polygon is the map bounds and the second is Salt Lake > > >> >> > County. > > > > >> >> > Future versions will use the datastore and be able to access OGC > Well > > >> >> > Known Text (WKT) from which geometry can be created for features > such > > >> >> > as: tracts, census block groups, census blocks, and parcels. > > > > >> >> -- > > >> >> Nick Johnson, App Engine Developer Programs Engineer > > >> >> Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration > > >> >> Number: 368047 > > > > >> -- > > >> Nick Johnson, App Engine Developer Programs Engineer > > >> Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration > > >> Number: 368047 > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~----------~----~----~----~------~----~------~--~---
