On 09/14/2011 01:22 AM, Kai Krueger wrote: > It probably would be easier to use CLUSTER and then let postgresql sort > out the rest. It would also make it easier to occasionally "re-cluster". > I guess it wouldn't be too difficult to test it out, if only it didn't > take so long to run those tests... > > Was there a reason not to use it in the first place?
i've been wondering about that, too, but after reading the full manual section on CLUSTER to its end i found that the tmp->sort->overwrite way is actually recommended for large tables that are too large to fit into RAM. See http://www.postgresql.org/docs/9.0/static/sql-cluster.html [CLUSTER] can be slow on large tables because the rows are fetched from the table in index order, and if the table is disordered, the entries are on random pages, so there is one disk page retrieved for every row moved. (PostgreSQL has a cache, but the majority of a big table will not fit in the cache.) while [tmp->sort->rename] uses the PostgreSQL sorting code to produce the desired order; this is usually much faster than an index scan for disordered data. For the sorting code to perform well work_mem is key though, while for the index recreation step maintenance_work_mem is needed. Currently the wiki guide on pgsql configuration suggests static values for both (and until recently did not mention work_mem at all). There's also an issue with osm2pgsql not really returning the cache memory to the operating system due to heap fragmentation. Better results for large imports (table sizes much larger than RAM size) could probably be archived by: * making sure osm2pgsql properly returns the memory used for its cache to the operating system, for this i've got a working patch: https://github.com/hholzgra/osm2pgsql/tree/freeable_cache * serializing the index creation and clustering steps running these in parallel makes sense where everything fits into caches in RAM so that things get CPU bound, but on large imports things will be IO bound anyway, and parallel index builds just lead to lower cache hit rates which causes even more IO load * start with low default work_mem and maintenance_work_mem settings and raise them on a per statement level, so making the appropriate buffer for a given operation (work_mem for ORDER BY, maintenance_work_mem for CREATE INDEX) as large as possible and then shrinking it back to its default size afterwards Whether CLUSTER or the current approach is better/faster for our imports needs to be benchmarked, my personal bet would be that CLUSTER wins as our data distribution over time is not totally random, but this really needs to me tested out. (one additional advantage of CLUSTER would be that peak disk space usage during the operation would only be about two times the data size instead of three times with the current approach) -- hartmut _______________________________________________ dev mailing list [email protected] http://lists.openstreetmap.org/listinfo/dev

