Hi David,

Yes, there is a fair amount of object overhead in Shapely. You may want to
consider a more optimized solution written as a C extension with no Shapely
classes at all. See "Subject 2.03: How do I find if a point lies within a
polygon?" in http://www.faqs.org/faqs/graphics/algorithms-faq/. FWIW,
there's an implementation of pnpoly in matplotlib:
http://matplotlib.org/1.2.0/api/nxutils_api.html.






On Fri, Aug 30, 2013 at 2:10 PM, David Stuebe <stu...@gmail.com> wrote:

>
>
> Hi GisPython
>
> I am new to using shapley.
>
> I have a few polygons and I am interested in finding out about which of
> some millions of points intersect those polygons.
>
> I only need to create the polygons once - there are dozens at most, but
> creating millions of point objects is killing my performance big time!
>
> Here is some cprofile output from a test run:
>
>          159616551 function calls (159601695 primitive calls) in 307.189
> seconds
>
>    Ordered by: internal time
>    List reduced from 1844 to 20 due to restriction <20>
>
>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>   7518198  135.451    0.000  185.103    0.000 predicates.py:8(__call__)
>   1253033   22.242    0.000   51.047    0.000
> point.py:170(geos_point_from_py)
>   7518198   16.698    0.000   16.698    0.000
> base.py:24(geometry_type_name)
>      8011   16.586    0.002  302.903    0.038
> shapely_intersects.py:26(shape_function)
>   7518198   14.399    0.000  201.687    0.000 base.py:447(intersects)
>   8771627   11.716    0.000   38.436    0.000 {hasattr}
>  15036396   11.607    0.000   45.146    0.000 topology.py:14(_validate)
>   8771241    7.834    0.000   11.235    0.000
> collections.py:119(itervalues)
>   3759104    7.515    0.000   10.693    0.000 base.py:131(empty)
>   1253033    7.092    0.000   12.961    0.000 numeric.py:446(require)
>   1253033    5.791    0.000   63.158    0.000 point.py:105(_set_coords)
>  37590990    5.679    0.000    5.679    0.000 base.py:162(_get_geom)
>   7518198    5.659    0.000   23.459    0.000 base.py:242(geometryType)
>   1253033    4.999    0.000    4.999    0.000 __init__.py:501(cast)
>   8771283    3.401    0.000    3.401    0.000 collections.py:72(__iter__)
>   7518198    3.260    0.000   26.719    0.000 base.py:245(type)
>   3759104    3.178    0.000    3.178    0.000 base.py:124(_is_empty)
>   1253033    2.874    0.000   66.251    0.000 point.py:38(__init__)
>   1253033    2.835    0.000   23.231    0.000 coords.py:17(required)
>   1253036    2.689    0.000    2.689    0.000 {method 'copy' of
> 'numpy.ndarray' objects}
>
> Would it be possible to pool the point objects and just change out the
> lat/lon location of the point object before every call to intersects?
>
> Here is a code except -
>
>         blen = len(block)
>
>         particle_position = block['loc'] # a (n,2) array of lat/lon
>
>         spillets_in_shapes = numpy.zeros((blen, slen),dtype='bool')
>
>         for i, pos in  enumerate(particle_position):
>             p = Point(pos)
>             for j,shape in enumerate(shapes.itervalues()):
>                 if shape.intersects(p):
>                     spillets_in_shapes[i,j] = True
>
>
> This code is called many times for each block of particles that I have to
> process.
>
> It seems most of my time is spent in initializing point objects and in
> something called predicates.py?
>
> Any suggestions for optimization?
>
> David
>
>
> _______________________________________________
> Community mailing list
> Community@lists.gispython.org
> http://lists.gispython.org/mailman/listinfo/community
>
>


-- 
Sean Gillies
_______________________________________________
Community mailing list
Community@lists.gispython.org
http://lists.gispython.org/mailman/listinfo/community

Reply via email to