Comments ... On Jul 22, 2009, at 5:18 PM, Howard Butler wrote:
> > On Jul 22, 2009, at 5:08 AM, Sean Gillies wrote: > >> I was expressing concern to Howard Butler about his plans to add the >> capacity to store Python objects to our Rtree class. After sleeping >> on >> the idea, it seems to be after all not unlike the OOBTree or IOBTree >> flavors of Zope's BTrees. I'm +1 on the idea. While on the subject of >> BTrees, I think we could do worse than to implement its interface >> when >> enhancing Rtree: >> >> http://docs.zope.org/zope3/Code/BTrees/Interfaces/index.html >> http://docs.zope.org/zope3/Code/BTrees/OOBTree/OOBTree.1 >> > > I guess I will use this as an opportunity to give folks an update of > what I've been doing. I've needed a C API of libspatialindex for > libLAS <http://liblas.org>, and my experience with Shapely and libLAS > told me that a ctypes-based API plus doctests would be an excellent > way to test it. Additionally, performance optimization of a library > like libspatialindex is rather tedious from C/C++, and having a Python > implementation that I can use to twiddle the knobs can speed > investigation of performance issues. > > These two things have resulted in a complete refactoring of Rtree. > The first iteration of Rtree had the following deficiencies: > > - It was a convoluted Python extension in C that wrapped some C++ > code. It was quite hard to maintain in order to do anything other > than fix small bugs. Yes, but there's some inevitable requirement for C code here. We're just choosing to have a more reusable C API instead of a single- purpose Python C extension. > - It only supported changing the pagesize parameter. libspatialindex > has nearly 20 properties that can be set/tweaked. I don't fully understand how to productively tweak them, yet, but I appreciate this. > - It didn't allow storage of objects in the tree (ie, clustered > index), only ids. Using only integer ids has been a design decision, not a fault. Rtree was originally intended to support spatial catalog search of documents, identified by integers, that could be stored *anywhere*. Storage agnosticism. I think this makes Rtree quite reusable. I guess I don't understand what you mean by a clustered index here. > - It only supported the RTree index variant of libspatialindex, not > the MVRtree (multi-versioned) or TPRTree (time-parameterized) variants > - It only supported 2D indexes > - It only allowed you to store boxes, not just points (which would > increase the storage efficiency by nearly a factor of four in most > cases) I agree index of points will be the most common ones. > > With these in mind, I started off implementing a C API for > libspatialindex. It currently lives in the Rtree repository > <http://svn.gispython.org/svn/gispy/Rtree/trunk/libsidx/ >> , but it will probably move to libspatialindex at some point. The C > API for libspatialindex hides a number of the details of working with > libspatialindex, and it currently provides functional parity (plus > object storage) with what was needed for the previous versions of > Rtree. I then reimplemented Rtree as a ctypes wrapper around the C > API. The current SVN head of Rtree contains this code. I have > disabled the old C/C++ wrappers, though they are still in the tree so > I can test performance changes. The new code currently provides > functional parity and no API changes from the previous version of > Rtree. > > Usage of this new version looks like > <http://svn.gispython.org/svn/gispy/Rtree/trunk/tests/index.txt >> . Query results from the index now come in the form of a C API > object that also contains a string of data for that record along with > the id. Because of this, I have wrapped the intersection and nearest > methods to only return ids from these query results, and a user would > need to use the intersection_obj or nearest_obj to fetch the actual > query results. If we are willing to break the API here, we can change > that. > > The following items have yet to be done in both the C API and Python > implementations: > - TPR/MVR trees. A bit of work needs to be done, but the basics are > there > - Point storage. insert will determine that the min and max values > are the same, and we will insert a SpatialIndex::Point instead of a > SpatialIndex::Region I'd prefer to make this more explicit and dodge numerical issues at the same time. Say, Region if the user inserts a 4-tuple, and Point if the user inserts a 2-tuple. > - There's a performance issue causing memory and disk indexes to query > at the same speed. I'm sure its something stupid I'm doing... > - Implementation of a query to find the bounds of the entire index > (quickly). > - Bulk insertion > Yay for bulk insertion. Could reindexing be part of this same feature? > I plan to continue working on these items throughout the week. I > would be interested in folks testing what's there if they have a > chance, but I will warn you that it requires using a SVN head version > of libspatialindex > <http://svn.gispython.org/svn/spatialindex/spatialindex/trunk/ >> . I have fixed a number of bugs along the way while implementing > this, and one of them was a show-stopper. > > As far as Sean's proposal to implement Zope's indexing interface atop > of Rtree, it sounds neat to me, but I'm not signing up to do it :) > > Howard I'd like to discuss how we're going to test the C API. Test it via Python in the old mapscript style, or write tests in C? Sean -- Sean Gillies Software Engineer Institute for the Study of the Ancient World New York University _______________________________________________ Community mailing list [email protected] http://lists.gispython.org/mailman/listinfo/community
