Here is the output of the same function rewritten using an iterator on the
a MultiPoint object rather than creating each Point object:

Tue Sep  3 14:51:46 2013    restats

         148634517 function calls (148605205 primitive calls) in 273.772
seconds

   Ordered by: internal time
   List reduced from 1869 to 40 due to restriction <40>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  7518198  137.685    0.000  187.778    0.000 predicates.py:8(__call__)
  7518198   17.106    0.000   17.106    0.000 base.py:24(geometry_type_name)
  1253033   15.604    0.000   18.043    0.000
point.py:170(geos_point_from_py)
  7518198   14.405    0.000  204.416    0.000 base.py:447(intersects)
     8011   12.041    0.002  268.612    0.034
shapely_intersects.py:28(shape_function)
 15042841   11.606    0.000   45.573    0.000 topology.py:14(_validate)
  8771251    7.613    0.000   10.888    0.000 collections.py:119(itervalues)
  1253033    6.750    0.000   11.462    0.000 base.py:588(_get_geom_item)
 37623215    5.981    0.000    5.981    0.000 base.py:162(_get_geom)
  8778104    5.789    0.000   33.026    0.000 {hasattr}
  7518198    5.584    0.000   23.819    0.000 base.py:242(geometryType)
     6445    4.689    0.001   25.740    0.004
multipoint.py:131(geos_multipoint_from_py)
  7518198    3.416    0.000   27.235    0.000 base.py:245(type)
  8771313    3.275    0.000    3.275    0.000 collections.py:72(__iter__)
  1259478    2.822    0.000    2.822    0.000 __init__.py:501(cast)
  7524643    2.313    0.000    2.313    0.000
geos.py:185(errcheck_predicate)
  7524643    2.237    0.000    2.237    0.000 impl.py:43(__getitem__)
  2518960    1.739    0.000    1.756    0.000 base.py:131(empty)
  1253033    1.615    0.000    1.831    0.000 point.py:38(__init__)
  1253033    1.357    0.000    3.187    0.000
multipoint.py:55(shape_factory)
    16038    1.267    0.000    2.065    0.000
oilmdl_reader.py:218(stream_record_blocks)
  1259484    1.013    0.000    1.552    0.000 base.py:164(_set_geom)
  1259478    1.012    0.000   12.509    0.000 base.py:596(__iter__)
1259476/1253032    0.885    0.000    2.103    0.000 base.py:141(__del__)
    43347    0.881    0.000    0.881    0.000 {method 'reduce' of
'numpy.ufunc' objects}
     6445    0.733    0.000    0.733    0.000 {sum}
  1259478    0.636    0.000    2.432    0.000 coords.py:17(required)
    16040    0.609    0.000    0.609    0.000 {method 'read' of 'file'
objects}
     8028    0.428    0.000    1.224    0.000 util.py:113(extents_coroutine)
2577031/2576733    0.415    0.000    0.415    0.000 {len}
     8010    0.309    0.000    0.309    0.000 grids.py:79(indexof)
     6445    0.244    0.000    0.244    0.000 {method 'dot' of
'numpy.ndarray' objects}
        1    0.218    0.218    0.218    0.218 parallel.py:1(<module>)
     6446    0.120    0.000    0.343    0.000
polygon_counter_op.py:17(polygon_counter_op)
     6446    0.110    0.000    0.353    0.000
polygon_mass_op.py:17(polygon_mass_op)
     8011    0.103    0.000    1.230    0.000
bin_grid_mapper.py:20(grid_function)
    32040    0.077    0.000    0.077    0.000
{numpy.core.multiarray.frombuffer}
    28886    0.073    0.000    0.073    0.000 {abs}
     6445    0.062    0.000   25.829    0.004 multipoint.py:28(__init__)
     8011    0.060    0.000    0.795    0.000
shore_oil_stats_op.py:27(shore_oil_stats_op)




On Fri, Aug 30, 2013 at 4:10 PM, David Stuebe <stu...@gmail.com> wrote:

>
>
> Hi GisPython
>
> I am new to using shapley.
>
> I have a few polygons and I am interested in finding out about which of
> some millions of points intersect those polygons.
>
> I only need to create the polygons once - there are dozens at most, but
> creating millions of point objects is killing my performance big time!
>
> Here is some cprofile output from a test run:
>
>          159616551 function calls (159601695 primitive calls) in 307.189
> seconds
>
>    Ordered by: internal time
>    List reduced from 1844 to 20 due to restriction <20>
>
>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>   7518198  135.451    0.000  185.103    0.000 predicates.py:8(__call__)
>   1253033   22.242    0.000   51.047    0.000
> point.py:170(geos_point_from_py)
>   7518198   16.698    0.000   16.698    0.000
> base.py:24(geometry_type_name)
>      8011   16.586    0.002  302.903    0.038
> shapely_intersects.py:26(shape_function)
>   7518198   14.399    0.000  201.687    0.000 base.py:447(intersects)
>   8771627   11.716    0.000   38.436    0.000 {hasattr}
>  15036396   11.607    0.000   45.146    0.000 topology.py:14(_validate)
>   8771241    7.834    0.000   11.235    0.000
> collections.py:119(itervalues)
>   3759104    7.515    0.000   10.693    0.000 base.py:131(empty)
>   1253033    7.092    0.000   12.961    0.000 numeric.py:446(require)
>   1253033    5.791    0.000   63.158    0.000 point.py:105(_set_coords)
>  37590990    5.679    0.000    5.679    0.000 base.py:162(_get_geom)
>   7518198    5.659    0.000   23.459    0.000 base.py:242(geometryType)
>   1253033    4.999    0.000    4.999    0.000 __init__.py:501(cast)
>   8771283    3.401    0.000    3.401    0.000 collections.py:72(__iter__)
>   7518198    3.260    0.000   26.719    0.000 base.py:245(type)
>   3759104    3.178    0.000    3.178    0.000 base.py:124(_is_empty)
>   1253033    2.874    0.000   66.251    0.000 point.py:38(__init__)
>   1253033    2.835    0.000   23.231    0.000 coords.py:17(required)
>   1253036    2.689    0.000    2.689    0.000 {method 'copy' of
> 'numpy.ndarray' objects}
>
> Would it be possible to pool the point objects and just change out the
> lat/lon location of the point object before every call to intersects?
>
> Here is a code except -
>
>         blen = len(block)
>
>         particle_position = block['loc'] # a (n,2) array of lat/lon
>
>         spillets_in_shapes = numpy.zeros((blen, slen),dtype='bool')
>
>         for i, pos in  enumerate(particle_position):
>             p = Point(pos)
>             for j,shape in enumerate(shapes.itervalues()):
>                 if shape.intersects(p):
>                     spillets_in_shapes[i,j] = True
>
>
> This code is called many times for each block of particles that I have to
> process.
>
> It seems most of my time is spent in initializing point objects and in
> something called predicates.py?
>
> Any suggestions for optimization?
>
> David
>
>
_______________________________________________
Community mailing list
Community@lists.gispython.org
http://lists.gispython.org/mailman/listinfo/community

Reply via email to