Dear all,

the attic feature of Overpass API should now work properly.

An example how to use this feature is
http://www.openstreetmap.org/user/ikonor/diary/23329
A big thank you to Ikonor at this point.

In detail, the database has been rebuilt. I tried to do as much checks as possible, and this has shown further bugs, which in turn led me to fix this in the code and then re-build the database again. A thank you to Stephan and Markus who managed to obtain a temporary powerful server; this ultimately enabled to do the last database rebuild within less than a week.

I will give details about the kind of bugs that kept me busy:

To keep the database as small as possible (currently more than 400 GB are painful, compare this to 25 GB of a PBF planet file), we store attic data as delta to the next newer version. Unchanged details like not changed tags aren't stored at all. For example, the tags of

<node version="3" lat="50" lon="10" timestamp="2011-01-01">
  <tag k="name" v="something"/>
  <tag k="highway" v="bus_stop"/>
  <tag k="bus" v="yes"/>
  <tag k="FIXME" v="check_name"/>
</node>

<node version="4" lat="50" lon="10" timestamp="2012-01-01">
  <tag k="name" v="something else"/>
  <tag k="highway" v="bus_stop"/>
  <tag k="bus" v="yes"/>
  <tag k="shelter" v="yes"/>
</node>

are stored as
current: "name" = "something else"
current: "highway" = "bus_stop"
current: "bus" = "yes"
current: "shelter" = "yes"
attic: before 2012: "name" = "something"
attic: before 2012: "FIXME" = "check_name"
attic: before 2012: "shelter" = void

This allows us to save space for the repeated tags "highway" and "bus".

The main traces of the bug were checksum disparities in five of the 6000 checked first augmented diffs, spanning roughly 12 September to 16 September 2012. In detail, the diffs generated from the database state as of 1st October 2012 (thus, representing deltas to the then-current state of 1st October 2012) were not consistent with the diffs generated for the same minutes based on deltas to the database state as of June 2014. A detailed re-generation of the augmented diffs of both database states has shown that an extra tag was present on the node 1700083447 in its attic state of September 2012 computed from June 2014 in comparison to the same node at the same attic state computed from the database as of October 2012.

Do you have guessed what has gone wrong? Me not so far, so I had to understand the subtle details. There are millions of nodes carrying tags, so what is so special about this one?

It turned out that the node has moved forth and back over a significant distance, see versions 11, 14, and 15. While the movement itself is not so large (about 2 km), it happened to change its quadtile index to a new value and then back to the old value. And in version 13, after the move forth, the tag is_capital=country has been added and not changed when the node moved back.

I managed in the delta computation to set a marker on the quadtile index of the older position that the tag is present, but I forgot to set a marker on the new position that the tag isn't present on earlier versions on this quadtile index.

Now: Why didn't this pop up earlier? Haven't there been tests? There have been tests, but obviously not enough, and you always only know afterwards which tests have been missing. The whole problem doesn't appear to a node when the node never existed at this place before (only 3 in a million of nodes ever get back to a quadtile index where they have been before). And it doesn't appear either when the tag had any value on this older node versions (so the total number of affected nodes is less than 100), because there is then a marker for this older version and this specific key. Both cases have been tested individually, but not both together.

In total: I've worked to get the number of bugs down, and I'm confident to call the database state now consistent, but there might be other arcane bugs that affect only few objects in specific versions. So please be bold to report suspect query results to me, but be also bold in using the new attic database.

Cheers,

Roland


_______________________________________________
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk

Reply via email to