Re: [gdal-dev] Dissolve large amount of geometries

Andreas Oxenstierna Mon, 16 Jul 2018 23:35:37 -0700

If it is 20 mill. coordinates, it could be faster than 80 minutes, Iguess in the region of 10 minutes.


I would:

1 Combine 3+4 to something like (I assume ST_Multi is not needed) - thisavoids data duplication and one unnecessary transaction commit SELECT randfield, (st_dump(ST_Union(f.geom))).geom AS geom FROMfishnet AS f GROUP BY randfield

2 Remove the CREATE INDEX and/or add a spatial index - I do not reallyunderstand the random union logic


3 CLUSTER any index and ANALYZE (after CREATE INDEX)

4 Check PostgreSQL configs (work_mem, shared_buffers etc) - your statsshows a non-linear scaling above 100000 polys so db/hardware limits are hit


5 Disable autovacuum - may interfere during ST_Union execution

6 Execute on real hardware - no (cheap) laptop and Linux is faster - asan example this kind of PostGIS execution can be 2-3 times faster oniMac compared with "standard laptop"


7 Step 2 may be more efficient as well with
ADD COLUMN randField integer DEFAULT ceil(RANDOM()*10)

I took the advice of Andreas and converted my code to using PostGIS.
And the speed difference is enormous.

The commands I've used:
// Import shapefile into PostGIS:
ogr2ogr -f PostgreSQL PG:"host=localhost user=..." fishnet.shp -gtunlimited -lco GEOMETRY_NAME=geom -a_srs "EPSG:28992"
// Add random data:
ogrinfo PG:"host=localhost user=..." -sql "ALTER TABLE fishnet ADDCOLUMN randField integer;UPDATE fishnet SET randField =ceil(RANDOM()*10);CREATE INDEX randfield_idx ON fishnet (randField);
// Dissolve:
ogrinfo PG:"host=localhost user=..." -sql "CREATE TABLE dissolved ASSELECT randfield, ST_Multi(ST_Union(f.geom)) AS geom FROM fishnet AS fGROUP BY randfield"
// Export to shapefile
ogr2ogr -f "ESRI Shapefile" taskmap.shp PG:"host=localhost user=..."-sql "SELECT randfield, (st_dump(geom)).geom AS geom FROM dissolved"-overwrite -gt unlimited
// Clean up:
ogrinfo PG:"host=localhost user=..." -sql "DROP TABLE IF EXISTSfishnet CASCADE;DROP TABLE IF EXISTS dissolved CASCADE"
The timing of the steps, after I converted the above commands to C#:
| #shapes |Import     |Add values |  Dissolve | Export |
|1,677 |00:00:00.29|00:00:00.14|00:00:00.10|00:00:00.05|
|4,810 |00:00:00.28|00:00:00.20|00:00:01.11|00:00:00.15|
|18,415  |00:00:00.78|00:00:00.53|00:00:04.57|00:00:00.31|
|72,288  |00:00:02.07|00:00:02.13|00:00:22.56|00:00:01.02|
|285,927 |00:00:07.58|00:00:06.68|00:01:59.59|00:00:03.82|
|1,139,424|00:00:26.63|00:00:33.51|00:11:34.63|00:00:15.19|
|4,546,854|00:01:46.19|00:03:51.92|01:10:07.41|00:00:57.51|

4.5 million squares of 0.3m have a total area of about 40 ha.
That is good enough for me. The total time will be around 80 min,instead of days using SQLite.
I'm assuming I can't speed up this command anymore, right?
ogrinfo PG:"host=localhost user=..." -sql "CREATE TABLE dissolved ASSELECT randfield, ST_Multi(ST_Union(f.geom)) AS geom FROM fishnet AS fGROUP BY randfield"
Thanks all for your valuable help.

Regards,

Paul
Op ma 16 jul. 2018 om 11:30 schreef Andreas Oxenstierna<[email protected] <mailto:[email protected]>>:
    ST_Union in PostGIS should scale better than SQLite.
    ST_Dump gives you singlepart geometries.

    Best Regards

    Andreas Oxenstierna


--
Hälsningar

Andreas Oxenstierna
T-Kartor Geospatial AB
mobile: +46 733 206831
mailto: [email protected]
http://www.t-kartor.com

_______________________________________________
gdal-dev mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/gdal-dev

Re: [gdal-dev] Dissolve large amount of geometries

Reply via email to