Re: [gdal-dev] gdal_polygonize.py TIF to JSON performance

Graeme B. Bell Tue, 13 Jan 2015 01:43:52 -0800

>> 
>> The reason for so many reads (though 2.3 seconds out of "a few hours" is
>> negligible overhead) is that the algorithm operates on a pair of adjacent
>> raster lines at a time. This allows processing of extremely large images
>> with very modest memory requirements. It's been a while since I've looked at
>> the code, but from my recollection, the algorithm should scale approximately
>> linearly in the number of pixels and polygons in the image. Far more
>> important to the run-time is the nature of the image itself. If the input is
>> something like a satellite photo, your output can be orders of magnitude
>> larger than the input image, as you can get a polygon for nearly every
>> pixel. If the output format is a verbose format like KML or JSON, the number
>> of bytes to describe each pixel is large. How big was the output in your
>> colleague's run?



Three points. 

- until last year, "Dan's GDAL scripts" had a polygonisation routine that was 
an order of magnitude faster than gdal_polygonise for our use cases.

- locally, for geometry burning/raster processing/polygonising we use 'rbuild' 
to manage tiling (I'm the author) to get e.g. 100x speedup from parallelisation 
and smaller tasks that fit better in cache - you can find it on 
http://github.com/gbb. You can run a polygon merge afterwards on the union of 
the tiles to consolidate the polygons.

- There are two 'worst case' situations for polygonisation. You outline one of 
them above (zillions of tiny polygons). My own experience has been that this 
problem was handled well by either gdal_polygonise or dan's scripts - I can't 
remember which one worked well. There is another 'worst case' situation that 
occurs frequently as follows: 

Whenever you deal with national scale data for any country with coastline, you 
frequently end up with an absolutely gigantic and horrifically complex single 
polygon which depicts the coastline and all the rivers throughout the country 
as a single continuous edge. This mega-polygon, so often present and so often 
necessary, is very time-consuming for gdal_polygonise to produce and the result 
is very painful for every GIS geometry package to handle. 

It would be great if the people behind gdal_polygonise could put some thought 
into this extremely common situation for anyone working with country or 
continent scale rasters to make sure that it is handled well. It has certainly 
affected us a great deal when working with data at up to 2m resolution for a 
country larger than the UK...

Graeme.


_______________________________________________
gdal-dev mailing list
[email protected]
http://lists.osgeo.org/mailman/listinfo/gdal-dev

Re: [gdal-dev] gdal_polygonize.py TIF to JSON performance

Reply via email to