On Jul 29, 2009, at 8:31 AM, Sean Gillies wrote:
>
> Yes, but there's some inevitable requirement for C code here. We're
> just choosing to have a more reusable C API instead of a single-
> purpose Python C extension.
>

Yep.  The audience of ctypes-capable folks is also larger than the  
audience of C/C++ python extension developers.  Hopefully this will  
result in more contribution, but if it doesn't we haven't lost anything.

>> - It only supported changing the pagesize parameter.  libspatialindex
>> has nearly 20 properties that can be set/tweaked.
>
> I don't fully understand how to productively tweak them, yet, but I
> appreciate this.
>

I'm awaiting a document from Marios describing the basic ideas, but  
one can find out the basics by reading the papers that the code is  
based on.  It is a fairly straightforward mapping.  The most  
significant knobs for us to tweak are the index_capacity and  
leaf_capacity.  For my scenario of inserting lots of points,  
performance is quite sensitive to these.  The various *pool_capacity  
parameters can be increased if you have more capable hardware.  These  
can improve insert and query performance.

>> - It didn't allow storage of objects in the tree (ie, clustered
>> index), only ids.
>
> Using only integer ids has been a design decision, not a fault. Rtree
> was originally intended to support spatial catalog search of
> documents, identified by integers, that could be stored *anywhere*.
> Storage agnosticism. I think this makes Rtree quite reusable. I guess
> I don't understand what you mean by a clustered index here.
>

Well, a limitation at least.  A clustered index in libspatialindex  
parlance just means storing the data inside the rtree.  In the new  
code, I have preserved id-only queries, and according to the  
benchmarks.py script, performance change is negligible (actually it is  
better for the disk-based indexes with the new code).

You would use the intersection_obj or nearest_obj to have it return  
pickles back to you.  This will obviously have a performance cost.   
The question is whether or not that cost is small enough and where to  
take the hit.  In some cases it will be worth it.  It will also be  
handy for those looking for a no fuss, spatially-aware storage bucket.


>> - Point storage.  insert will determine that the min and max values
>> are the same, and we will insert a SpatialIndex::Point instead of a
>> SpatialIndex::Region
>
> I'd prefer to make this more explicit and dodge numerical issues at
> the same time. Say, Region if the user inserts a 4-tuple, and Point if
> the user inserts a 2-tuple.

What about a 6-tuple, 8-tuple, or a k-tuple (we support kD indexes  
now)?  The code currently checks using  
std::numeric_limits<double>::epsilon to determine the min and max are  
equivalent, and if any of the tests fail, inserts a region.

>>
>> - Bulk insertion
>>
>
> Yay for bulk insertion. Could reindexing be part of this same feature?
>

What do you mean by reindexing?  I don't have bulk insertion done yet,  
but the C API will work by using callbacks.  If there were a  
straightforward and common tree-walking query that people need, this  
could be implemented as well.

>
> I'd like to discuss how we're going to test the C API. Test it via
> Python in the old mapscript style, or write tests in C?

For libLAS, I do mapscript-style, with Python doctests doing the  
work.  Much easier to read and add to than any of the c unit testing  
frameworks.  Coverage-wise, I think 
http://svn.gispython.org/svn/gispy/Rtree/trunk/tests/index.txt 
  and http://svn.gispython.org/svn/gispy/Rtree/trunk/tests/properties.txt 
  provide 100% C API coverage right now.  I'd be open to moving to  
entirely C tests, but I admit to it being a low priority given what  
has already been developed.


_______________________________________________
Community mailing list
[email protected]
http://lists.gispython.org/mailman/listinfo/community

Reply via email to