Hi Martijn,

this seems to run OK, but invariably, after leting this run for a few
hours with a 5 minute interval (to catch up, my initial extract is a
couple of months old) the database table only holds a small number (less
than 20) nodes. What is going wrong here?

well, sorry to say that, but it has multiple problems :)

1. Your filter (--tf accept-ways x=* --tf --accept-nodes x=*) doesn't do what you want, because it filters out _all_ nodes that aren't tagged with gnis:id=*, including those that constitute the gnis:id=* ways. So you end up with a bunch of ways in your stream with empty geometries. This is probably the main reason you see only a small number of nodes in the DB, because there's nothing else --wp oder --wpc can write to the DB. See my earlier post [1] about how to do tag-based filtering with osmosis :)

2. In the long run, you'll get wrong data if you only store the filtered data. Consider this scenario:

T+0: you get your initial extract, way 12345 has no gnis:id, gets filtered out and is not stored in the DB
T+1: somebody sets gnis:id=foo on way 12345
T+2: you get a change stream from replication which says: "Update way 12345 with these tags" and you have no way to update. From your point of view, this "update" is a "create" - but nobody but you knows that. Worse even, you have no nodes for this way because they got filtered out at T+0 and are not included in the change stream. No nodes -> no geometry, even if you manage to sneak that way object into your DB somehow.

I maintain some thematic extracts for my work myself. Here's what I do:

-------
#!/bin/bash
# archive the last known good version
mv germany-railways.osm.pbf germany-railways.osm.pbf.1

# replicate the full extract, calls osmosis --rri
$HOME/scripts/get-changes.sh germany-boxed.osm.pbf state

# "thematic filtering", calls Osmosis to filter out railways
$HOME/scripts/filter-railways.sh germany-boxed.osm.pbf germany-railways.osm.pbf

# derive change for the railways
osmosis --rb germany-railways.osm.pbf --sort --rb germany-railways.osm.pbf.1 --sort --derive-change bufferCapacity=10000 --lpc --wxc railways.osc

# update DB (this is the osm2pgsql equivalent to --wpc)
osm2pgsql -U podolsir -d gis --prefix osm_railways -a -m -s -S $HOME/scripts/railways.style railways.osc
------

Basically, this way you keep your replication targets compatible with the respective replication sources (more or less, a bbox-based extract is not fully water-proof either, but it works for a reasonably generous bbox). Based on that, you do your tag based filtering and derive a change which has the right "updates" and "creates".

Yes, this _is_ much slower than the "intuititve" way (I started out with that, too :)), because you need to process _all_ data you have in --apply-change this way.

You could try to keep everything in your PostGIS database and then just SELECT the stuff that has "gnis:id" for actual processing. However I don't know what that means in performance terms, as I didn't use that kind of databases yet on any scale worth mentioning. My guess would be that --apply-change gets faster but you'll need much more disk space.

In any case: if you replicate, you need a source and a target that are compatible. Since your replication source is the planet, ideally you should have a complete planet as the target. Large geographic extracts work more or less, tag-based extracts almost never work as replication targets.

Hope that helps
Igor

[1] http://lists.openstreetmap.org/pipermail/dev/2011-April/022394.html

_______________________________________________
dev mailing list
[email protected]
http://lists.openstreetmap.org/listinfo/dev

Reply via email to