On Tue, Sep 13, 2011 at 1:07 AM, Kai Krueger <[email protected]> wrote: > Hi, > > I was thinking about ways to try and speed up osm2pgsql. Currently a good > fraction of time, both in full imports and during diff-processing, is spent > in the "going over pending ways / relations" section. Therefore speeding up > that section should bring the overall time down quite a bit. One thought to > try and speed up the "going over pending ways / relations" is to try and > parallelize it.
That's funny, I'd been looking last week at the next step in the processing and wondering if I could get a speed increase by un-parrallel-ising it. I don't have the time or the skills to make much headway, so I'll happily confuse this thread by talking about other things. http://trac.openstreetmap.org/browser/applications/utils/export/osm2pgsql/output-pgsql.c?rev=26651#L1365 At the moment, the creation of the temporary tables is done in parallel, and so you need up to the sum of the sizes of the geometry tables in free space (albeit some tables run faster than others, so - depending on timing - you need less space). My concern is that doing this serially will lead to improved IO (instead of thrashing between threads) and less free space required since you'll only need up to max(sizeoftables) instead of potentially sum(sizeoftables). As for the create tmp -> sort -> overwrite, is there anything to be gained by using the built-in CLUSTER instead? I'm not sure how well our method will actually arrange things on-disk, but again I've done nothing to investigate any hunches. Just some thoughts from having stared at the output for too many hours :-) Cheers, Andy _______________________________________________ dev mailing list [email protected] http://lists.openstreetmap.org/listinfo/dev

