After a little more thought I am ready to try an improved method to
getting a geospatial index on App Engine.

Concept: Use a serialized STRtree

Possible implementations:

1) Roll my own STRTree package to implement Serializable, which is
part of App Engine's whitelist. Convert to a byte array and store as a
Blob, or serialize into a special ParcelIndex entity as a Text
property. Issues: a) building the spatial index offline with JTS and
uploading it into production, b) CPU time and usage for deserializing
a large spatial index on the fly, c) alternative strategy: store/
access index with file i/o instead of an entity in the datastore (not
sure about reads/filesize). To help improve speed, don't actually
insert the geometry for the indexed features. I could use a parcel_id
String which would link back to my Parcel entity, which would have the
Point WKT for parsing if I need the geometry for anything else on the
server, or for display on the client. Queries ask the spatial index
for spatial relationships. Results from the spatial index yank
attributes using normal datastore techniques.

2) Do pretty much the same thing with Protocol Buffers instead of
native Java Serialization in order to take advantage of PB features.
Not entirely sure yet how I would go about PB'ing an STRtree, but
exporting a PB to a byte array to store as a Blob looks promising.
Could also go to file (offline) and push it up to production in WEB-
INF for live reads.

When Task Queues are ready for Java, it would be nice to replace
offline JTS index building with a task. But that method would likely
have to employ the datastore method rather than the file method, as
file writing is forbidden.

Also, my app does not have any user inserts, so STRtree seems better
than QuadTree. However, all of my entities are Point based, so there
seems to be an advantage with QuadTree if I only have points.

Suggestions?

Stuart

On Jul 10, 9:36 pm, Stuart Moffatt <[email protected]> wrote:
> All,
>
> Beta 0.6 ofhttp://giscloud.appspot.comis live. Details below.
>
> This version hits the datastore for the county entity (rather than in-
> lining as in previous versions), yanks out 242KB worth of well-known
> text in the form of POLYGON((<list of coordinate pairs>))) describing
> the county at a high resolution, parses the WKT into a proper vector
> Polygon using JTS, and proceeds with the spatial tests (point-in-
> polygon for click, polygon-in-polygon for drag or zoom).
>
> On that first hit after an application update, the request reported
> "5022ms 11314cpu_ms 31api_cpu_ms", which is a little too high for my
> liking, and which the log report warns me about. But subsequent
> requests -- about 50 clicks within about 10 minutes) are averaging
> around "100ms 100cpu_ms 31api_cpu_ms".
>
> So, as expected, it appears GQL is sucking out the county with its
> fairly big WKT in decent time (~31ms).
> And JTS is cranking out the spatial query pretty quickly too.
>
> Compared with previous versions, it seems that an initial spike on the
> first hit after an update is commonplace. Beta 0.3 reported a "1384ms
> 2967cpu_ms" for the same servlet request right after it was updated.
> (Not sure why the timing format changed, did I miss some App Engine
> tweak between last night and tonight?)
>
> Rough comparison of peak times on first hit:
> (a) Beta 0.3: No datastore call (county WKT was inline) + JTS on the
> fly = 1.4 seconds then 3 seconds CPU
> (b) Beta 0.6: Datastore call for large entity + JTS on the fly = 5
> seconds then 11 seconds CPU, and 31ms api CPU
>
> Rough comparison of average times (once initial hit completed):
> (a) Beta 0.3: 12ms 9cpu_ms
> (a) Beta 0.6: 100ms 100cpu_ms 31api_cpu_ms
>
> It seems I am comparing apples and oranges here because the time
> slices prior to today don't include the breakout of ms/cpu_ms/
> api_cpu_ms, but its a ballpark, right?
>
> The only real difference in implementation between Beta 0.3 and Beta
> 0.6 is fetching the entity from the datastore instead of having it
> inlined in the servlet. For that, we get 10x the amount of charged
> time. Unfortunately once you pull a version off default, there is no
> "current load" report in the dashboard that shows nice averages, so I
> will have to watch averages over time.
>
> I would appreciate some heavy traffic to see what kind of pounding it
> can take given these preliminary numbers.
>
> http://giscloud.appspot.com
>
> Thanks for your help.
>
> Stuart
>
> On Jul 10, 1:05 pm, Stuart Moffatt <[email protected]> wrote:
>
> > [Note: cross-posting back to gae for java to pick up an earlier thread,
> > further discussions should remain on the main app engine group]
>
> > Brian,
>
> > After Nick's comments I dove into finding out more about this and came
> > across your Arc2Earth Cloud services and some comments you made about
> > "quadtrees packed with rtrees", which I would love to know more about. So,
> > yes, I have started looking. And in the process have realized that you and
> > others have laid significant groundwork that I have to catch up on.
>
> > In terms of getting entities out of the datastore efficiently, I have a plan
> > that involves a combination of intersection ops (via JTS) and set membership
> > (via GQL). Tell me if I am off my rocker here, and keep in mind that I my
> > app is not a general purposeGIS, so there is some tradeoff (as has been
> > eluded to elsewhere) in terms of functionality/flexibility.
>
> > The intent of my app is display parcel centroids as points and include
> > various attributes (addr, etc). The reference app is pretty basic, but I am
> > using it as a JTS sandbox for another app I am building, which has to do
> > with parcel-level earthquake loss estimation for Salt Lake County. The
> > analysis was offline but I am targetting AppEngine for display etc.
>
> > My entities are: parcels, census blocks, census block groups, tracts,
> > counties, states.
>
> > My basic use case is: Find your house in Salt Lake County (via zoom or
> > search using geocoding via the Maps API). Click on the marker over your
> > house and discover the losses incurred in an M7 earthquake. Pretty basic
> > stuff. Except there are ~240,000 parcels in the datastore.
>
> > Conceptual implementation: Pass the map bounding box to the server, ask for
> > the parcel points that are within the map bounds. Easy as pie in PostGIS
> > because of the embedded spatial functions in the SQL and the built in
> > spatial indexing. Not so easy in App Engine with GQL.
>
> > Possible solutions:
>
> >    - Geohash
> >    http://labs.metacarta.com/blog/27.entry/geographic-queries-on-google-...
> > problems with accuracy.
>
> >    - Geocellshttp://code.google.com/p/geomodel-Python, limited spatial
> >    queries, has same problem as next solution
>
> >    - Precomputing bounding boxes for proximity search
> >    http://code.google.com/appengine/articles/geosearch.html-Python,
> >    pre-computed grids work for "what else is near where I clicked" but
> >    intersecting the map viewport (bounds) with geometry from the datastore
> >    still requires more work. While I would not choose this approach, I agree
> >    with Brett Slatkin's explanation of pre-computed set membership and list
> >    properties --
> >    http://code.google.com/events/io/sessions/BuildingScalableComplexApps...
>
> > Proposed solution with JTS/GQL
>
> >    - Design data structures with set membership in mind and use GQL for
> >    queries. In my case, parcels belong in census blocks, which belong in 
> > block
> >    groups, which belong in tracts, which belong in counties, which belong in
> >    states. This structure is a natural for set membership. I can do this two
> >    ways: either a parcel knows what census block belongs to (and so on up 
> > the
> >    tree), or a state has a list of counties that belong to it (and so on 
> > down
> >    the tree). Each nested set reduces the maximum number of entities I need 
> > to
> >    query, but has a few tradeoffs (more below).
>
> >    - Incorporate OGC's Well Known Text (WKT) as the geom property for each
> >    entity and use JTS for spatial queries. Point-in-polygon is a good fit 
> > here.
> >    So are distance calculations.
>
> > Implementation details:
>
> > There are 29 counties in Utah. That's a pretty fast query in GQL, so grab
> > all 29. Each of them come out of the datastore with their own WKT
> > representing the geometry. Loop over the 29 counties, use JTS to create
> > polygons for each of them (this is why AppEngine has distributed CPU,
> > right?) and test my incoming point (where the user clicked on the map) to
> > find which county they clicked in.
>
> > Now make a second GQL query based on either set membership of tracts in the
> > county (the county has a list of tracts that belong to it) or by asking for
> > all tracts that have the county name as a property. Both of these properties
> > would be pre-computed. I think I favor the set membership approach, because
> > I believe it reduces the table scan. With my list of tracts in a county (eg
> > Salt Lake County has 209), use GQL to ask for the tracts explicitly.
> > AppEngine should not have any problem retrieving 209 results efficiently.
> > With each of the 209 tracts comes their own WKT. Loop over the list and
> > create polygons with JTS. Ask JTS which tract my clicked point is in. Now I
> > know the tract I am in.
>
> > Repeat this process with a third GQL query for census block groups, combined
> > with a third JTS loop over those for another point-in-polygon query. Repeat
> > again for census blocks, at which point we can grab all the parcels for that
> > census block (by set membership only) and return their point coordinates
> > across the wire for display in the client. There will likely also be a fixup
> > for times when the map's bounding box actually encompasses two (or more)
> > census blocks.
>
> > Note: I am only sending back parcel point coords to the client, which knows
> > nothing of tracts, blockgroups, or blocks. Those are only used to make GQL
> > queries return less than 1000 results. Practically guaranteed manageable
> > chunks of data in App Engine courtesy of the Census Bureau.
>
> > Tradeoffs:
>
> > Obviously, the major trade-off is that with every drag of the map at the
> > appropriate zoom level, we do 4+ GQL set membership queries and make JTS
> > create a few hundred polygons and test them for intersections. With
> > AppEngine it is still fast, but my app quotas may be gobbled up quickly.
> > More testing is required here with the full data set and the app under load.
>
> > On the other hand, implementing spatial indexes in App Engine is not trivial
> > (as you well know). Any direction you can provide is surely welcome. Have
> > you created QuadTree and RTree as entity types ("kinds")? I imagine if I do
> > this either offline or on the first pass and conditionally thereafter (which
> > may be slower) then I might get away without having to repeatedly hit the
> > datastore or spike the CPU with JTS every time.
>
> > Thoughts, comments, money for research: all are welcome ;)
>
> > Stuart
>
> > On Fri, Jul 10, 2009 at 11:56 AM, bFlood <[email protected]> wrote:
>
> > > hi stuart
>
> > > great work getting JTS to run on GAE.
>
> > > Have you started to look at how you would port those JTS indexes to
> > > the GAE datastore? for me (and others), the hard part with geospatial
> > > queries in GAE isn't the geometry intersection ops (although JTS is a
> > > welcome addition for this), its how you efficiently get entities out
> > > of the datastore. (I think thats what Nick might be hinting at above).
> > > I'm not sure the memory resident JTS indexes are going to work (even
> > > for small datasets in memcache), so those existing JTS structures will
> > > need to be re-written with the "rules of GAE" in mind, which quickly
> > > leads to the problem above
>
> > > cheers and good luck with it
> > > brian
>
> > > On Jul 10, 12:53 pm, Stuart Moffatt <[email protected]> wrote:
> > > > Nick,
>
> ...
>
> read more »
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to