[Note: cross-posting back to gae for java to pick up an earlier thread,
further discussions should remain on the main app engine group]

Brian,

After Nick's comments I dove into finding out more about this and came
across your Arc2Earth Cloud services and some comments you made about
"quadtrees packed with rtrees", which I would love to know more about. So,
yes, I have started looking. And in the process have realized that you and
others have laid significant groundwork that I have to catch up on.

In terms of getting entities out of the datastore efficiently, I have a plan
that involves a combination of intersection ops (via JTS) and set membership
(via GQL). Tell me if I am off my rocker here, and keep in mind that I my
app is not a general purpose GIS, so there is some tradeoff (as has been
eluded to elsewhere) in terms of functionality/flexibility.

The intent of my app is display parcel centroids as points and include
various attributes (addr, etc). The reference app is pretty basic, but I am
using it as a JTS sandbox for another app I am building, which has to do
with parcel-level earthquake loss estimation for Salt Lake County. The
analysis was offline but I am targetting AppEngine for display etc.

My entities are: parcels, census blocks, census block groups, tracts,
counties, states.

My basic use case is: Find your house in Salt Lake County (via zoom or
search using geocoding via the Maps API). Click on the marker over your
house and discover the losses incurred in an M7 earthquake. Pretty basic
stuff. Except there are ~240,000 parcels in the datastore.

Conceptual implementation: Pass the map bounding box to the server, ask for
the parcel points that are within the map bounds. Easy as pie in PostGIS
because of the embedded spatial functions in the SQL and the built in
spatial indexing. Not so easy in App Engine with GQL.

Possible solutions:

   - Geohash
   
http://labs.metacarta.com/blog/27.entry/geographic-queries-on-google-app-engine-
problems with accuracy.

   - Geocells http://code.google.com/p/geomodel - Python, limited spatial
   queries, has same problem as next solution

   - Precomputing bounding boxes for proximity search
   http://code.google.com/appengine/articles/geosearch.html - Python,
   pre-computed grids work for "what else is near where I clicked" but
   intersecting the map viewport (bounds) with geometry from the datastore
   still requires more work. While I would not choose this approach, I agree
   with Brett Slatkin's explanation of pre-computed set membership and list
   properties --
   http://code.google.com/events/io/sessions/BuildingScalableComplexApps.html

Proposed solution with JTS/GQL

   - Design data structures with set membership in mind and use GQL for
   queries. In my case, parcels belong in census blocks, which belong in block
   groups, which belong in tracts, which belong in counties, which belong in
   states. This structure is a natural for set membership. I can do this two
   ways: either a parcel knows what census block belongs to (and so on up the
   tree), or a state has a list of counties that belong to it (and so on down
   the tree). Each nested set reduces the maximum number of entities I need to
   query, but has a few tradeoffs (more below).

   - Incorporate OGC's Well Known Text (WKT) as the geom property for each
   entity and use JTS for spatial queries. Point-in-polygon is a good fit here.
   So are distance calculations.


Implementation details:

There are 29 counties in Utah. That's a pretty fast query in GQL, so grab
all 29. Each of them come out of the datastore with their own WKT
representing the geometry. Loop over the 29 counties, use JTS to create
polygons for each of them (this is why AppEngine has distributed CPU,
right?) and test my incoming point (where the user clicked on the map) to
find which county they clicked in.

Now make a second GQL query based on either set membership of tracts in the
county (the county has a list of tracts that belong to it) or by asking for
all tracts that have the county name as a property. Both of these properties
would be pre-computed. I think I favor the set membership approach, because
I believe it reduces the table scan. With my list of tracts in a county (eg
Salt Lake County has 209), use GQL to ask for the tracts explicitly.
AppEngine should not have any problem retrieving 209 results efficiently.
With each of the 209 tracts comes their own WKT. Loop over the list and
create polygons with JTS. Ask JTS which tract my clicked point is in. Now I
know the tract I am in.

Repeat this process with a third GQL query for census block groups, combined
with a third JTS loop over those for another point-in-polygon query. Repeat
again for census blocks, at which point we can grab all the parcels for that
census block (by set membership only) and return their point coordinates
across the wire for display in the client. There will likely also be a fixup
for times when the map's bounding box actually encompasses two (or more)
census blocks.

Note: I am only sending back parcel point coords to the client, which knows
nothing of tracts, blockgroups, or blocks. Those are only used to make GQL
queries return less than 1000 results. Practically guaranteed manageable
chunks of data in App Engine courtesy of the Census Bureau.

Tradeoffs:

Obviously, the major trade-off is that with every drag of the map at the
appropriate zoom level, we do 4+ GQL set membership queries and make JTS
create a few hundred polygons and test them for intersections. With
AppEngine it is still fast, but my app quotas may be gobbled up quickly.
More testing is required here with the full data set and the app under load.

On the other hand, implementing spatial indexes in App Engine is not trivial
(as you well know). Any direction you can provide is surely welcome. Have
you created QuadTree and RTree as entity types ("kinds")? I imagine if I do
this either offline or on the first pass and conditionally thereafter (which
may be slower) then I might get away without having to repeatedly hit the
datastore or spike the CPU with JTS every time.

Thoughts, comments, money for research: all are welcome ;)

Stuart


On Fri, Jul 10, 2009 at 11:56 AM, bFlood <[email protected]> wrote:

>
> hi stuart
>
> great work getting JTS to run on GAE.
>
> Have you started to look at how you would port those JTS indexes to
> the GAE datastore? for me (and others), the hard part with geospatial
> queries in GAE isn't the geometry intersection ops (although JTS is a
> welcome addition for this), its how you efficiently get entities out
> of the datastore. (I think thats what Nick might be hinting at above).
> I'm not sure the memory resident JTS indexes are going to work (even
> for small datasets in memcache), so those existing JTS structures will
> need to be re-written with the "rules of GAE" in mind, which quickly
> leads to the problem above
>
> cheers and good luck with it
> brian
>
> On Jul 10, 12:53 pm, Stuart Moffatt <[email protected]> wrote:
> > Nick,
> >
> > More information on JTS indexes:
> >
> > 1) STRTree - a packed R-Tree that does not support insert/delete once
> built
> > 2) QuadTree - slower than STRTree but supports insert/delete
> >
> > http://jump-pilot.sourceforge.net/pdfs/jts_secrets_foss4g2007.pdf
> >
> > Stuart
> >
> > On Fri, Jul 10, 2009 at 10:39 AM, Stuart Moffatt <
> [email protected]>wrote:
> >
> >
> >
> > > Nick,
> >
> > > Correct, as in, not yet. The reference app just parses well-known text
> on
> > > the fly to create geometry (server side only), and tests that geometry
> > > (which is usually complex) against incoming coordinates (so far only
> points
> > > and map bounds). Next stop, figuring out the right kind of index and
> its
> > > implementation.
> >
> > > If you care to elaborate on the comment you made with regard to R-trees
> or
> > > space-filling curve algorithms, I'm open to implementation ideas.
> >
> > > Stuart
> >
> > > On Fri, Jul 10, 2009 at 10:32 AM, Nick Johnson (Google) <
> > > [email protected]> wrote:
> >
> > >> Hi Stuart,
> >
> > >> Am I correct in inferring that you're not storing geospatial indexes
> > >> persistently at all, then? This seems impractical for the App Engine
> > >> architecture, and for the scale it is intended to support.
> >
> > >> -Nick Johnson
> >
> > >> On Fri, Jul 10, 2009 at 5:29 PM, Stuart Moffatt<
> [email protected]>
> > >> wrote:
> > >> > Nick,
> >
> > >> > First, there was a typo in my announcement that I just realized. The
> > >> second
> > >> > last paragraph should state that the "current version of the
> application
> > >> > does NOT demonstrate use of the datastore", that is, I am still in
> the
> > >> > process of implementing fetches of WKT from my entities, but that's
> the
> > >> > plan.
> >
> > >> > As to your questions:
> >
> > >> > 1) JTS indexes. From their technical spec: "There is a large
> literature
> > >> of
> > >> > efficient algorithms for intersection detection. Unfortunately, many
> of
> > >> them
> > >> > involve substantial code complexity. JTS tries to balance code
> > >> simplicity
> > >> > with performance gains. It uses some special techniques to produce
> > >> > substantial performance gains for common types of input data. These
> > >> > techniques include in-memory spatial indexes of various types, and
> > >> > sophisticated methods for structuring data such as the technique of
> > >> Monotone
> > >> > Chains." (See
> > >> >http://www.vividsolutions.com/jts/bin/JTS%20Technical%20Specs.pdf).
> >
> > >> > 2) I am using brute force (ie CPU) to do point-in-polygon and
> > >> > polygon-in-polygon in this version of the app, so I am letting JTS
> worry
> > >> > about managing its own in-memory indexes. Nothing fancy built on top
> of
> > >> > AppEngine (yet), but I will be looking at App Engine indexes and
> perhaps
> > >> > blobs.
> >
> > >> > Stuart
> >
> > >> > On Fri, Jul 10, 2009 at 7:21 AM, Nick Johnson (Google)
> > >> > <[email protected]> wrote:
> >
> > >> >> Hi Stuart,
> >
> > >> >> Very nice!
> >
> > >> >> Out of curiosity, what indexing scheme does JTS use? Are you
> building
> > >> >> an R-Tree or other tree based structure on top of the datastore, or
> > >> >> are you using a hilbert-curve type approach?
> >
> > >> >> -Nick Johnson
> >
> > >> >> On Fri, Jul 10, 2009 at 8:03 AM, Stuart Moffatt<
> > >> [email protected]>
> > >> >> wrote:
> >
> > >> >> > Announcement:
> >
> > >> >> > The GIS in the Cloud (App Engine + JTS) reference project is now
> > >> >> > available athttp://giscloud.appspot.comwith the source code
> > >> >> > available athttp://giscloud.googlecode.com(documentation
> > >> >> > forthcoming).
> >
> > >> >> > While many GIS/spatial App Engine discussions center on the
> python
> > >> SDK
> > >> >> > (with pre-computing grids or geocells and heavy lifting done with
> set
> > >> >> > membership in the datastore), this reference project uses the
> Java
> > >> >> > Topology Suite (http://www.vividsolutions.com/jts/jtshome.htm)
> > >> within
> > >> >> > the App Engine server to demonstrate point-in-polygon and
> polygon-in-
> > >> >> > polygon spatial queries.
> >
> > >> >> > The GIS in the Cloud app is pure Java and was built with
> >
> > >> >> > * Google App Engine SDK for Java
> > >> >> > * Google Plugin for Eclipse
> > >> >> > * Google Web Toolkit
> > >> >> > * Google Web Toolkit API Libraries - Google Maps 1.0 Library
> > >> >> > * Java Topology Suite
> >
> > >> >> > The Java Topology Suite was ported to C++ and became GEOS, which
> was
> > >> >> > embedded in PostgreSQL to become PostGIS, allowing users access
> to
> > >> >> > spatial functions within SQL. While App Engine does not give us
> > >> >> > spatial functions in GQL, the GIS in the Cloud app will
> demonstrate a
> > >> >> > GQL/JTS combination to accomplish the same effect.
> >
> > >> >> > The current version of the application does demonstrate use of
> the
> > >> >> > datastore (TODO), but it does employ JTS to query
> point-in-polygon
> > >> and
> > >> >> > polygon-in-polygon. Single clicks on the map trigger the
> point-in-
> > >> >> > polygon test (the point is the map center and the polygon is Salt
> > >> Lake
> > >> >> > County). With zoom or drag, the polygon-in-polygon test is
> triggered.
> > >> >> > The first polygon is the map bounds and the second is Salt Lake
> > >> >> > County.
> >
> > >> >> > Future versions will use the datastore and be able to access OGC
> Well
> > >> >> > Known Text (WKT) from which geometry can be created for features
> such
> > >> >> > as: tracts, census block groups, census blocks, and parcels.
> >
> > >> >> --
> > >> >> Nick Johnson, App Engine Developer Programs Engineer
> > >> >> Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration
> > >> >> Number: 368047
> >
> > >> --
> > >> Nick Johnson, App Engine Developer Programs Engineer
> > >> Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration
> > >> Number: 368047
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to