If it is 20 mill. coordinates, it could be faster than 80 minutes, I guess in the region of 10 minutes.

I would:
1 Combine 3+4 to something like (I assume ST_Multi is not needed) - this avoids data duplication and one unnecessary transaction commit    SELECT randfield, (st_dump(ST_Union(f.geom))).geom AS geom FROM fishnet AS f GROUP BY randfield

2 Remove the CREATE INDEX and/or add a spatial index - I do not really understand the random union logic

3 CLUSTER any index and ANALYZE (after CREATE INDEX)

4 Check PostgreSQL configs (work_mem, shared_buffers etc) - your stats shows a non-linear scaling above 100000 polys so db/hardware limits are hit

5 Disable autovacuum - may interfere during ST_Union execution

6 Execute on real hardware - no (cheap) laptop and Linux is faster - as an example this kind of PostGIS execution can be 2-3 times faster on iMac compared with "standard laptop"

7 Step 2 may be more efficient as well with
ADD COLUMN randField integer DEFAULT ceil(RANDOM()*10)

I took the advice of Andreas and converted my code to using PostGIS.
And the speed difference is enormous.

The commands I've used:
// Import shapefile into PostGIS:
ogr2ogr -f PostgreSQL PG:"host=localhost user=..." fishnet.shp -gt unlimited -lco GEOMETRY_NAME=geom -a_srs "EPSG:28992"
// Add random data:
ogrinfo PG:"host=localhost user=..." -sql "ALTER TABLE fishnet ADD COLUMN randField integer;UPDATE fishnet SET randField = ceil(RANDOM()*10);CREATE INDEX randfield_idx ON fishnet (randField);
// Dissolve:
ogrinfo PG:"host=localhost user=..." -sql "CREATE TABLE dissolved AS SELECT randfield, ST_Multi(ST_Union(f.geom)) AS geom FROM fishnet AS f GROUP BY randfield"
// Export to shapefile
ogr2ogr -f "ESRI Shapefile" taskmap.shp PG:"host=localhost user=..." -sql "SELECT randfield, (st_dump(geom)).geom AS geom FROM dissolved" -overwrite -gt unlimited
// Clean up:
ogrinfo PG:"host=localhost user=..." -sql "DROP TABLE IF EXISTS fishnet CASCADE;DROP TABLE IF EXISTS dissolved CASCADE"

The timing of the steps, after I converted the above commands to C#:
| #shapes |Import     |Add values |  Dissolve | Export |
|1,677 |00:00:00.29|00:00:00.14|00:00:00.10|00:00:00.05|
|4,810 |00:00:00.28|00:00:00.20|00:00:01.11|00:00:00.15|
|18,415  |00:00:00.78|00:00:00.53|00:00:04.57|00:00:00.31|
|72,288  |00:00:02.07|00:00:02.13|00:00:22.56|00:00:01.02|
|285,927 |00:00:07.58|00:00:06.68|00:01:59.59|00:00:03.82|
|1,139,424|00:00:26.63|00:00:33.51|00:11:34.63|00:00:15.19|
|4,546,854|00:01:46.19|00:03:51.92|01:10:07.41|00:00:57.51|

4.5 million squares of 0.3m have a total area of about 40 ha.
That is good enough for me. The total time will be around 80 min, instead of days using SQLite.

I'm assuming I can't speed up this command anymore, right?
ogrinfo PG:"host=localhost user=..." -sql "CREATE TABLE dissolved AS SELECT randfield, ST_Multi(ST_Union(f.geom)) AS geom FROM fishnet AS f GROUP BY randfield"

Thanks all for your valuable help.

Regards,

Paul

Op ma 16 jul. 2018 om 11:30 schreef Andreas Oxenstierna <[email protected] <mailto:[email protected]>>:

    ST_Union in PostGIS should scale better than SQLite.
    ST_Dump gives you singlepart geometries.

    Best Regards

    Andreas Oxenstierna


--
Hälsningar

Andreas Oxenstierna
T-Kartor Geospatial AB
mobile: +46 733 206831
mailto: [email protected]
http://www.t-kartor.com

_______________________________________________
gdal-dev mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/gdal-dev

Reply via email to