On 3/2/2018 2:01 AM, Josip Rodin wrote:
On Thu, Mar 01, 2018 at 09:41:54AM -0800, Paul Norman wrote:
when I import the nodes for all of Europe, the ways get processed at a
rate of 30/s
It's slow during the osm2pgsql import stage. General advice for
osm2pgsql applies here. For a large import, you want more RAM. Ideally,
you should have enough cache to fit all the node positions in RAM. For
Europe, this is probably 20GB to 25GB on a machine with 32GB of RAM.
Yesterday the Europe import told me that it processed 2045878k nodes.
At 8 bytes per lat and 8 per long, that sounds more like 30.5 GB? Not sure
where osm2pgsql reads it from... st_memsize(place.geometry) seems to return
32 bytes actually, would that imply 41 GB? That would seem to match the size
of the flatnode file, too.

Node positions take 8 bytes per node, and cache efficiency is about 85% for the full planet. I haven't done an import for Europe recently, but taking 60% as a guess, that would give 26GB cache needed for all node positions.

Because flat nodes are persistent, they're designed differently, and take 8 bytes * maximum node ID + a few hundred bytes for headers.

Anyway, a more pertinent point would be how does the size of osm2pgsql cache
correlate to that, i.e. how do we estimate that it would it organize itself
in a way that 20 to 25 GB would be enough to get a good hit rate?

The easiest way to get cache efficiency is to look at the log output after an import. You could write external software that calculates the efficiency for a given list of nodes, but it's easier to run osm2pgsql with excess cache (using -O null if you're doing it a lot). Using my data from 2015 and https://github.com/openstreetmap/osm2pgsql/pull/441 I got 84.5% efficiency for the planet, 62% for Europe, and 59-50% for 2GB PBFs and smaller.

Geocoding mailing list

Reply via email to