On Thu, Mar 01, 2018 at 09:41:54AM -0800, Paul Norman wrote:
> > when I import the nodes for all of Europe, the ways get processed at a
> > rate of 30/s
> It's slow during the osm2pgsql import stage. General advice for 
> osm2pgsql applies here. For a large import, you want more RAM. Ideally, 
> you should have enough cache to fit all the node positions in RAM. For 
> Europe, this is probably 20GB to 25GB on a machine with 32GB of RAM.

Yesterday the Europe import told me that it processed 2045878k nodes.
At 8 bytes per lat and 8 per long, that sounds more like 30.5 GB? Not sure
where osm2pgsql reads it from... st_memsize(place.geometry) seems to return
32 bytes actually, would that imply 41 GB? That would seem to match the size
of the flatnode file, too.

Anyway, a more pertinent point would be how does the size of osm2pgsql cache
correlate to that, i.e. how do we estimate that it would it organize itself
in a way that 20 to 25 GB would be enough to get a good hit rate?

And, conversely, if we know that it will order the operations in a way
that produces a good hit rate, what are the parameters behind that -
maybe going beyond 16 GB won't reduce the import time significantly...?

In retrospect, my 5 GB cache for 41 GB of data does seem way too optimistic.

> Keep in mind that even with regular use, database workloads like Nominatim
> perform best with plenty of RAM.

It would be useful to have some more info beforehand on that, too, like what
are the most relevant indexes for each use case (geocoding, reverse
geocoding, ...), what is their pg_total_relation_size(), how fragmented does
it get over time, ...

     2. That which causes joy or happiness.

Geocoding mailing list

Reply via email to