On 01/-10/-28163 12:59 PM, Hartmut Holzgraefe wrote:
[...]
For the sorting code to perform well work_mem is key though, while
for the index recreation step maintenance_work_mem is needed.
Currently the wiki guide on pgsql configuration suggests static
values for both (and until recently did not mention work_mem at all).
There's also an issue with osm2pgsql not really returning the
cache memory to the operating system due to heap fragmentation.
Better results for large imports (table sizes much larger than
RAM size) could probably be archived by:
* making sure osm2pgsql properly returns the memory used for its
cache to the operating system, for this i've got a working patch:
https://github.com/hholzgra/osm2pgsql/tree/freeable_cache
OK, I have committed your patch and extended it to have a command line
option to fall back to the old behavior.
In Linux, allocating the node cache as one large chunk works well, as
internally Linux over commits memory and only allocates physical ram for
those pages that are actually written too. So you can still specify a
large cache value and only use as much physical memory as you need to
cache all the nodes. As it is hard to guess how much cache one needs,
particularly for diff-imports, this is pretty helpfull to not waste memory.
Other operating systems (Solaris? Mac OSX? Windows?) might behave
differently and actually reserve the full amount of memory, in which
case one might want to fall back to the old behavior at least for diff
imports.
* serializing the index creation and clustering steps
running these in parallel makes sense where everything fits
into caches in RAM so that things get CPU bound, but on large
imports things will be IO bound anyway, and parallel index
builds just lead to lower cache hit rates which causes even
more IO load
I have committed this patch too, although I have changed it so that the
default remains to do the indexing in parallel, and the command line
switch changes the behavior to serial indexing.
On my preliminary benchmarks, doing the indexing in parallel was
slightly faster than doing it one table at a time. However, I only tried
it on small planet extracts (about 100Mb for the osm.pbf file). I also
didn't play around with the postgresql settings of work_mem and
maintanance_work_mem. Which is potentially where the benefit from doing
things sequentially comes from, by being able to set those values higher.
* start with low default work_mem and maintenance_work_mem settings
and raise them on a per statement level, so making the appropriate
buffer for a given operation (work_mem for ORDER BY,
maintenance_work_mem for CREATE INDEX) as large as possible and
then shrinking it back to its default size afterwards
Is it possible to adjust work_mem and maintanance_work_mem at run time?
Whether CLUSTER or the current approach is better/faster for our
imports needs to be benchmarked, my personal bet would be that
CLUSTER wins as our data distribution over time is not totally
random, but this really needs to me tested out.
I haven't yet benchmarked the difference between CLUSTER and the current
sorting based method.
Has anyone else got numbers on this?
Kai
(one additional advantage of CLUSTER would be that peak disk
space usage during the operation would only be about two times
the data size instead of three times with the current approach)
_______________________________________________
dev mailing list
[email protected]
http://lists.openstreetmap.org/listinfo/dev