[google-appengine] Re: Announce: GIS in the Cloud (App Engine + Java Topology Suite)

Stuart Moffatt Fri, 10 Jul 2009 20:36:36 -0700

All,

Beta 0.6 of http://giscloud.appspot.com is live. Details below.


This version hits the datastore for the county entity (rather than in-
lining as in previous versions), yanks out 242KB worth of well-known
text in the form of POLYGON((<list of coordinate pairs>))) describing
the county at a high resolution, parses the WKT into a proper vector
Polygon using JTS, and proceeds with the spatial tests (point-in-
polygon for click, polygon-in-polygon for drag or zoom).

On that first hit after an application update, the request reported
"5022ms 11314cpu_ms 31api_cpu_ms", which is a little too high for my
liking, and which the log report warns me about. But subsequent
requests -- about 50 clicks within about 10 minutes) are averaging
around "100ms 100cpu_ms 31api_cpu_ms".

So, as expected, it appears GQL is sucking out the county with its
fairly big WKT in decent time (~31ms).
And JTS is cranking out the spatial query pretty quickly too.

Compared with previous versions, it seems that an initial spike on the
first hit after an update is commonplace. Beta 0.3 reported a "1384ms
2967cpu_ms" for the same servlet request right after it was updated.
(Not sure why the timing format changed, did I miss some App Engine
tweak between last night and tonight?)

Rough comparison of peak times on first hit:
(a) Beta 0.3: No datastore call (county WKT was inline) + JTS on the
fly = 1.4 seconds then 3 seconds CPU
(b) Beta 0.6: Datastore call for large entity + JTS on the fly = 5
seconds then 11 seconds CPU, and 31ms api CPU

Rough comparison of average times (once initial hit completed):
(a) Beta 0.3: 12ms 9cpu_ms
(a) Beta 0.6: 100ms 100cpu_ms 31api_cpu_ms

It seems I am comparing apples and oranges here because the time
slices prior to today don't include the breakout of ms/cpu_ms/
api_cpu_ms, but its a ballpark, right?

The only real difference in implementation between Beta 0.3 and Beta
0.6 is fetching the entity from the datastore instead of having it
inlined in the servlet. For that, we get 10x the amount of charged
time. Unfortunately once you pull a version off default, there is no
"current load" report in the dashboard that shows nice averages, so I
will have to watch averages over time.

I would appreciate some heavy traffic to see what kind of pounding it
can take given these preliminary numbers.

http://giscloud.appspot.com

Thanks for your help.

Stuart

On Jul 10, 1:05 pm, Stuart Moffatt <[email protected]> wrote:
> [Note: cross-posting back to gae for java to pick up an earlier thread,
> further discussions should remain on the main app engine group]
>
> Brian,
>
> After Nick's comments I dove into finding out more about this and came
> across your Arc2Earth Cloud services and some comments you made about
> "quadtrees packed with rtrees", which I would love to know more about. So,
> yes, I have started looking. And in the process have realized that you and
> others have laid significant groundwork that I have to catch up on.
>
> In terms of getting entities out of the datastore efficiently, I have a plan
> that involves a combination of intersection ops (via JTS) and set membership
> (via GQL). Tell me if I am off my rocker here, and keep in mind that I my
> app is not a general purpose GIS, so there is some tradeoff (as has been
> eluded to elsewhere) in terms of functionality/flexibility.
>
> The intent of my app is display parcel centroids as points and include
> various attributes (addr, etc). The reference app is pretty basic, but I am
> using it as a JTS sandbox for another app I am building, which has to do
> with parcel-level earthquake loss estimation for Salt Lake County. The
> analysis was offline but I am targetting AppEngine for display etc.
>
> My entities are: parcels, census blocks, census block groups, tracts,
> counties, states.
>
> My basic use case is: Find your house in Salt Lake County (via zoom or
> search using geocoding via the Maps API). Click on the marker over your
> house and discover the losses incurred in an M7 earthquake. Pretty basic
> stuff. Except there are ~240,000 parcels in the datastore.
>
> Conceptual implementation: Pass the map bounding box to the server, ask for
> the parcel points that are within the map bounds. Easy as pie in PostGIS
> because of the embedded spatial functions in the SQL and the built in
> spatial indexing. Not so easy in App Engine with GQL.
>
> Possible solutions:
>
>    - Geohash
>    http://labs.metacarta.com/blog/27.entry/geographic-queries-on-google-...
> problems with accuracy.
>
>    - Geocellshttp://code.google.com/p/geomodel- Python, limited spatial
>    queries, has same problem as next solution
>
>    - Precomputing bounding boxes for proximity search
>    http://code.google.com/appengine/articles/geosearch.html- Python,
>    pre-computed grids work for "what else is near where I clicked" but
>    intersecting the map viewport (bounds) with geometry from the datastore
>    still requires more work. While I would not choose this approach, I agree
>    with Brett Slatkin's explanation of pre-computed set membership and list
>    properties --
>    http://code.google.com/events/io/sessions/BuildingScalableComplexApps...
>
> Proposed solution with JTS/GQL
>
>    - Design data structures with set membership in mind and use GQL for
>    queries. In my case, parcels belong in census blocks, which belong in block
>    groups, which belong in tracts, which belong in counties, which belong in
>    states. This structure is a natural for set membership. I can do this two
>    ways: either a parcel knows what census block belongs to (and so on up the
>    tree), or a state has a list of counties that belong to it (and so on down
>    the tree). Each nested set reduces the maximum number of entities I need to
>    query, but has a few tradeoffs (more below).
>
>    - Incorporate OGC's Well Known Text (WKT) as the geom property for each
>    entity and use JTS for spatial queries. Point-in-polygon is a good fit 
> here.
>    So are distance calculations.
>
> Implementation details:
>
> There are 29 counties in Utah. That's a pretty fast query in GQL, so grab
> all 29. Each of them come out of the datastore with their own WKT
> representing the geometry. Loop over the 29 counties, use JTS to create
> polygons for each of them (this is why AppEngine has distributed CPU,
> right?) and test my incoming point (where the user clicked on the map) to
> find which county they clicked in.
>
> Now make a second GQL query based on either set membership of tracts in the
> county (the county has a list of tracts that belong to it) or by asking for
> all tracts that have the county name as a property. Both of these properties
> would be pre-computed. I think I favor the set membership approach, because
> I believe it reduces the table scan. With my list of tracts in a county (eg
> Salt Lake County has 209), use GQL to ask for the tracts explicitly.
> AppEngine should not have any problem retrieving 209 results efficiently.
> With each of the 209 tracts comes their own WKT. Loop over the list and
> create polygons with JTS. Ask JTS which tract my clicked point is in. Now I
> know the tract I am in.
>
> Repeat this process with a third GQL query for census block groups, combined
> with a third JTS loop over those for another point-in-polygon query. Repeat
> again for census blocks, at which point we can grab all the parcels for that
> census block (by set membership only) and return their point coordinates
> across the wire for display in the client. There will likely also be a fixup
> for times when the map's bounding box actually encompasses two (or more)
> census blocks.
>
> Note: I am only sending back parcel point coords to the client, which knows
> nothing of tracts, blockgroups, or blocks. Those are only used to make GQL
> queries return less than 1000 results. Practically guaranteed manageable
> chunks of data in App Engine courtesy of the Census Bureau.
>
> Tradeoffs:
>
> Obviously, the major trade-off is that with every drag of the map at the
> appropriate zoom level, we do 4+ GQL set membership queries and make JTS
> create a few hundred polygons and test them for intersections. With
> AppEngine it is still fast, but my app quotas may be gobbled up quickly.
> More testing is required here with the full data set and the app under load.
>
> On the other hand, implementing spatial indexes in App Engine is not trivial
> (as you well know). Any direction you can provide is surely welcome. Have
> you created QuadTree and RTree as entity types ("kinds")? I imagine if I do
> this either offline or on the first pass and conditionally thereafter (which
> may be slower) then I might get away without having to repeatedly hit the
> datastore or spike the CPU with JTS every time.
>
> Thoughts, comments, money for research: all are welcome ;)
>
> Stuart
>
> On Fri, Jul 10, 2009 at 11:56 AM, bFlood <[email protected]> wrote:
>
> > hi stuart
>
> > great work getting JTS to run on GAE.
>
> > Have you started to look at how you would port those JTS indexes to
> > the GAE datastore? for me (and others), the hard part with geospatial
> > queries in GAE isn't the geometry intersection ops (although JTS is a
> > welcome addition for this), its how you efficiently get entities out
> > of the datastore. (I think thats what Nick might be hinting at above).
> > I'm not sure the memory resident JTS indexes are going to work (even
> > for small datasets in memcache), so those existing JTS structures will
> > need to be re-written with the "rules of GAE" in mind, which quickly
> > leads to the problem above
>
> > cheers and good luck with it
> > brian
>
> > On Jul 10, 12:53 pm, Stuart Moffatt <[email protected]> wrote:
> > > Nick,
>
> > > More information on JTS indexes:
>
> > > 1) STRTree - a packed R-Tree that does not support insert/delete once
> > built
> > > 2) QuadTree - slower than STRTree but supports insert/delete
>
> > >http://jump-pilot.sourceforge.net/pdfs/jts_secrets_foss4g2007.pdf
>
> > > Stuart
>
> > > On Fri, Jul 10, 2009 at 10:39 AM, Stuart Moffatt <
> > [email protected]>wrote:
>
> > > > Nick,
>
> > > > Correct, as in, not yet. The reference app just parses well-known text
> > on
> > > > the fly to create geometry (server side only), and tests that geometry
> > > > (which is usually complex) against incoming coordinates (so far only
> > points
> > > > and map bounds). Next stop, figuring out the right kind of index and
> > its
> > > > implementation.
>
> > > > If you care to elaborate on the comment you made with regard to R-trees
> > or
> > > > space-filling curve algorithms, I'm open to implementation ideas.
>
> > > > Stuart
>
> > > > On Fri, Jul 10, 2009 at 10:32 AM, Nick Johnson (Google) <
> > > > [email protected]> wrote:
>
> > > >> Hi Stuart,
>
> > > >> Am I correct in inferring that you're not storing geospatial indexes
> > > >> persistently at all, then? This seems impractical for the App Engine
> > > >> architecture, and for the scale it is intended to support.
>
> > > >> -Nick Johnson
>
> > > >> On Fri, Jul 10, 2009 at 5:29 PM, Stuart Moffatt<
> > [email protected]>
> > > >> wrote:
> > > >> > Nick,
>
> > > >> > First, there was a typo in my announcement that I just realized. The
> > > >> second
> > > >> > last paragraph should state that the "current version of the
> > application
> > > >> > does NOT demonstrate use of the datastore", that is, I am still in
> > the
> > > >> > process of implementing fetches of WKT from my entities, but that's
> > the
> > > >> > plan.
>
> > > >> > As to your questions:
>
> > > >> > 1) JTS indexes. From their technical spec: "There is a large
> > literature
> > > >> of
> > > >> > efficient algorithms for intersection detection. Unfortunately, many
> > of
> > > >> them
> > > >> > involve substantial code complexity. JTS tries to balance code
> > > >> simplicity
> > > >> > with performance gains. It uses some special techniques to produce
> > > >> > substantial performance gains for common types of input data. These
> > > >> > techniques include in-memory spatial indexes of various types, and
> > > >> > sophisticated methods for structuring data such as the technique of
> > > >> Monotone
> > > >> > Chains." (See
> > > >> >http://www.vividsolutions.com/jts/bin/JTS%20Technical%20Specs.pdf).
>
> > > >> > 2) I am using brute force (ie CPU) to do point-in-polygon and
> > > >> > polygon-in-polygon in this version of the app, so I am letting JTS
> > worry
> > > >> > about managing its own in-memory
>
> ...
>
> read more »
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

[google-appengine] Re: Announce: GIS in the Cloud (App Engine + Java Topology Suite)

Reply via email to