Re: [Community] Designing new types of Rtree indexes

Sean Gillies Wed, 29 Jul 2009 06:32:18 -0700

Comments ...

On Jul 22, 2009, at 5:18 PM, Howard Butler wrote:


>
> On Jul 22, 2009, at 5:08 AM, Sean Gillies wrote:
>
>> I was expressing concern to Howard Butler about his plans to add the
>> capacity to store Python objects to our Rtree class. After sleeping  
>> on
>> the idea, it seems to be after all not unlike the OOBTree or IOBTree
>> flavors of Zope's BTrees. I'm +1 on the idea. While on the subject of
>> BTrees, I think we could do worse than to implement its interface  
>> when
>> enhancing Rtree:
>>
>> http://docs.zope.org/zope3/Code/BTrees/Interfaces/index.html
>> http://docs.zope.org/zope3/Code/BTrees/OOBTree/OOBTree.1
>>
>
> I guess I will use this as an opportunity to give folks an update of
> what I've been doing.  I've needed a C API of libspatialindex for
> libLAS <http://liblas.org>, and my experience with Shapely and libLAS
> told me that a ctypes-based API plus doctests would be an excellent
> way to test it.  Additionally, performance optimization of a library
> like libspatialindex is rather tedious from C/C++, and having a Python
> implementation that I can use to twiddle the knobs can speed
> investigation of performance issues.
>
> These two things have resulted in a complete refactoring of Rtree.
> The first iteration of Rtree had the following deficiencies:
>
> - It was a convoluted Python extension in C that wrapped some C++
> code.  It was quite hard to maintain in order to do anything other
> than fix small bugs.

Yes, but there's some inevitable requirement for C code here. We're  
just choosing to have a more reusable C API instead of a single- 
purpose Python C extension.

> - It only supported changing the pagesize parameter.  libspatialindex
> has nearly 20 properties that can be set/tweaked.

I don't fully understand how to productively tweak them, yet, but I  
appreciate this.

> - It didn't allow storage of objects in the tree (ie, clustered
> index), only ids.

Using only integer ids has been a design decision, not a fault. Rtree  
was originally intended to support spatial catalog search of  
documents, identified by integers, that could be stored *anywhere*.  
Storage agnosticism. I think this makes Rtree quite reusable. I guess  
I don't understand what you mean by a clustered index here.

> - It only supported the RTree index variant of libspatialindex, not
> the MVRtree (multi-versioned) or TPRTree (time-parameterized) variants
> - It only supported 2D indexes
> - It only allowed you to store boxes, not just points (which would
> increase the storage efficiency by nearly a factor of four in most
> cases)

I agree index of points will be the most common ones.

>
> With these in mind, I started off implementing a C API for
> libspatialindex.  It currently lives in the Rtree repository 
> <http://svn.gispython.org/svn/gispy/Rtree/trunk/libsidx/
>> , but it will probably move to libspatialindex at some point.  The C
> API for libspatialindex hides a number of the details of working with
> libspatialindex, and it currently provides functional parity (plus
> object storage) with what was needed for the previous versions of
> Rtree.  I then reimplemented Rtree as a ctypes wrapper around the C
> API.  The current SVN head of Rtree contains this code.  I have
> disabled the old C/C++ wrappers, though they are still in the tree so
> I can test performance changes.  The new code currently provides
> functional parity and no API changes from the previous version of  
> Rtree.
>
> Usage of this new version looks like 
> <http://svn.gispython.org/svn/gispy/Rtree/trunk/tests/index.txt
>> .  Query results from the index now come in the form of a C API
> object that also contains a string of data for that record along with
> the id.  Because of this, I have wrapped the intersection and nearest
> methods to only return ids from these query results, and a user would
> need to use the intersection_obj or nearest_obj to fetch the actual
> query results.  If we are willing to break the API here, we can change
> that.
>
> The following items have yet to be done in both the C API and Python
> implementations:
> - TPR/MVR trees.  A bit of work needs to be done, but the basics are
> there
> - Point storage.  insert will determine that the min and max values
> are the same, and we will insert a SpatialIndex::Point instead of a
> SpatialIndex::Region

I'd prefer to make this more explicit and dodge numerical issues at  
the same time. Say, Region if the user inserts a 4-tuple, and Point if  
the user inserts a 2-tuple.

> - There's a performance issue causing memory and disk indexes to query
> at the same speed.  I'm sure its something stupid I'm doing...
> - Implementation of a query to find the bounds of the entire index
> (quickly).
> - Bulk insertion
>

Yay for bulk insertion. Could reindexing be part of this same feature?

> I plan to continue working on these items throughout the week.  I
> would be interested in folks testing what's there if they have a
> chance, but I will warn you that it requires using a SVN head version
> of libspatialindex 
> <http://svn.gispython.org/svn/spatialindex/spatialindex/trunk/
>> .  I have fixed a number of bugs along the way while implementing
> this, and one of them was a show-stopper.
>
> As far as Sean's proposal to implement Zope's indexing interface atop
> of Rtree, it sounds neat to me, but I'm not signing up to do it :)
>
> Howard

I'd like to discuss how we're going to test the C API. Test it via  
Python in the old mapscript style, or write tests in C?

Sean

--
Sean Gillies
Software Engineer
Institute for the Study of the Ancient World
New York University

_______________________________________________
Community mailing list
[email protected]
http://lists.gispython.org/mailman/listinfo/community

Re: [Community] Designing new types of Rtree indexes

Reply via email to