Re: [OSM-dev] keeping thematic planet extract up to date

Igor Podolskiy Tue, 18 Oct 2011 10:08:03 -0700

Hi Martijn,

this seems to run OK, but invariably, after leting this run for a few
hours with a 5 minute interval (to catch up, my initial extract is a
couple of months old) the database table only holds a small number (less
than 20) nodes. What is going wrong here?


well, sorry to say that, but it has multiple problems :)

1. Your filter (--tf accept-ways x=* --tf --accept-nodes x=*) doesn't dowhat you want, because it filters out _all_ nodes that aren't taggedwith gnis:id=*, including those that constitute the gnis:id=* ways. Soyou end up with a bunch of ways in your stream with empty geometries.This is probably the main reason you see only a small number of nodes inthe DB, because there's nothing else --wp oder --wpc can write to theDB. See my earlier post [1] about how to do tag-based filtering withosmosis :)

2. In the long run, you'll get wrong data if you only store the filtereddata. Consider this scenario:

T+0: you get your initial extract, way 12345 has no gnis:id, getsfiltered out and is not stored in the DB

T+1: somebody sets gnis:id=foo on way 12345

T+2: you get a change stream from replication which says: "Update way12345 with these tags" and you have no way to update. From your point ofview, this "update" is a "create" - but nobody but you knows that. Worseeven, you have no nodes for this way because they got filtered out atT+0 and are not included in the change stream. No nodes -> no geometry,even if you manage to sneak that way object into your DB somehow.


I maintain some thematic extracts for my work myself. Here's what I do:

-------
#!/bin/bash
# archive the last known good version
mv germany-railways.osm.pbf germany-railways.osm.pbf.1

# replicate the full extract, calls osmosis --rri
$HOME/scripts/get-changes.sh germany-boxed.osm.pbf state

# "thematic filtering", calls Osmosis to filter out railways

$HOME/scripts/filter-railways.sh germany-boxed.osm.pbfgermany-railways.osm.pbf


# derive change for the railways

osmosis --rb germany-railways.osm.pbf --sort --rbgermany-railways.osm.pbf.1 --sort --derive-change bufferCapacity=10000--lpc --wxc railways.osc


# update DB (this is the osm2pgsql equivalent to --wpc)

osm2pgsql -U podolsir -d gis --prefix osm_railways -a -m -s -S$HOME/scripts/railways.style railways.osc

------

Basically, this way you keep your replication targets compatible withthe respective replication sources (more or less, a bbox-based extractis not fully water-proof either, but it works for a reasonably generousbbox). Based on that, you do your tag based filtering and derive achange which has the right "updates" and "creates".

Yes, this _is_ much slower than the "intuititve" way (I started out withthat, too :)), because you need to process _all_ data you have in--apply-change this way.

You could try to keep everything in your PostGIS database and then justSELECT the stuff that has "gnis:id" for actual processing. However Idon't know what that means in performance terms, as I didn't use thatkind of databases yet on any scale worth mentioning. My guess would bethat --apply-change gets faster but you'll need much more disk space.

In any case: if you replicate, you need a source and a target that arecompatible. Since your replication source is the planet, ideally youshould have a complete planet as the target. Large geographic extractswork more or less, tag-based extracts almost never work as replicationtargets.


Hope that helps
Igor

[1] http://lists.openstreetmap.org/pipermail/dev/2011-April/022394.html

_______________________________________________
dev mailing list
[email protected]
http://lists.openstreetmap.org/listinfo/dev

Re: [OSM-dev] keeping thematic planet extract up to date

Reply via email to