On Mar 26, 2013, at 11:02 PM, David Fawcett <david.fawc...@gmail.com> wrote:
> Obviously something went very wrong on the last two.  They use the RTree, but 
> are an order of magnitude slower than the most basic RTree example.  I am 
> definitely curious if my code is bad or if it has to do with the way that 
> geometries and properties are store within the RTree.

A few points about using clustered indexes (store the data in the index itself) 
with Rtree. First, it is very sensitive to the page size parameter that is set 
when the index is created. The default page size is quite low because the base 
example is to use Rtree in a non-clustered configuration and just store 
indices. Too large of a page size means bloating up the index (and on-disk 
footprint) of the index. Too small of a page size means reallocating and moving 
tons of bytes around for each insert to make the GeoJSON/WKB/WKT of the 
geometry fit in the index.

Secondly, you can override/control the serialization that happens when items 
are inserted or removed from the index. For shapely geoms, the default probably 
i/o's through GeoJSON (didn't look), but you could make it i/o through WKB and 
it would likely be faster and more compact. Rtree just stores pickles. The 
faster your pickles are, the faster the serialization/deserialization will 
happen out of the index.

Finally, clustered index storage is a lazy man's not-so-fast spatial database. 
There's some threshold of number of searches x number of items where it crosses 
the realm of usefulness and performant-enough, but it's quite sensitive to the 
configuration of a number of things. 

Hope this helps,

Howard


_______________________________________________
Community mailing list
Community@lists.gispython.org
http://lists.gispython.org/mailman/listinfo/community

Reply via email to