On Jul 29, 2009, at 8:31 AM, Sean Gillies wrote: > > Yes, but there's some inevitable requirement for C code here. We're > just choosing to have a more reusable C API instead of a single- > purpose Python C extension. >
Yep. The audience of ctypes-capable folks is also larger than the audience of C/C++ python extension developers. Hopefully this will result in more contribution, but if it doesn't we haven't lost anything. >> - It only supported changing the pagesize parameter. libspatialindex >> has nearly 20 properties that can be set/tweaked. > > I don't fully understand how to productively tweak them, yet, but I > appreciate this. > I'm awaiting a document from Marios describing the basic ideas, but one can find out the basics by reading the papers that the code is based on. It is a fairly straightforward mapping. The most significant knobs for us to tweak are the index_capacity and leaf_capacity. For my scenario of inserting lots of points, performance is quite sensitive to these. The various *pool_capacity parameters can be increased if you have more capable hardware. These can improve insert and query performance. >> - It didn't allow storage of objects in the tree (ie, clustered >> index), only ids. > > Using only integer ids has been a design decision, not a fault. Rtree > was originally intended to support spatial catalog search of > documents, identified by integers, that could be stored *anywhere*. > Storage agnosticism. I think this makes Rtree quite reusable. I guess > I don't understand what you mean by a clustered index here. > Well, a limitation at least. A clustered index in libspatialindex parlance just means storing the data inside the rtree. In the new code, I have preserved id-only queries, and according to the benchmarks.py script, performance change is negligible (actually it is better for the disk-based indexes with the new code). You would use the intersection_obj or nearest_obj to have it return pickles back to you. This will obviously have a performance cost. The question is whether or not that cost is small enough and where to take the hit. In some cases it will be worth it. It will also be handy for those looking for a no fuss, spatially-aware storage bucket. >> - Point storage. insert will determine that the min and max values >> are the same, and we will insert a SpatialIndex::Point instead of a >> SpatialIndex::Region > > I'd prefer to make this more explicit and dodge numerical issues at > the same time. Say, Region if the user inserts a 4-tuple, and Point if > the user inserts a 2-tuple. What about a 6-tuple, 8-tuple, or a k-tuple (we support kD indexes now)? The code currently checks using std::numeric_limits<double>::epsilon to determine the min and max are equivalent, and if any of the tests fail, inserts a region. >> >> - Bulk insertion >> > > Yay for bulk insertion. Could reindexing be part of this same feature? > What do you mean by reindexing? I don't have bulk insertion done yet, but the C API will work by using callbacks. If there were a straightforward and common tree-walking query that people need, this could be implemented as well. > > I'd like to discuss how we're going to test the C API. Test it via > Python in the old mapscript style, or write tests in C? For libLAS, I do mapscript-style, with Python doctests doing the work. Much easier to read and add to than any of the c unit testing frameworks. Coverage-wise, I think http://svn.gispython.org/svn/gispy/Rtree/trunk/tests/index.txt and http://svn.gispython.org/svn/gispy/Rtree/trunk/tests/properties.txt provide 100% C API coverage right now. I'd be open to moving to entirely C tests, but I admit to it being a low priority given what has already been developed. _______________________________________________ Community mailing list [email protected] http://lists.gispython.org/mailman/listinfo/community
