Dear all,
the attic feature of Overpass API should now work properly.
An example how to use this feature is
http://www.openstreetmap.org/user/ikonor/diary/23329
A big thank you to Ikonor at this point.
In detail, the database has been rebuilt. I tried to do as much checks
as possible, and this has shown further bugs, which in turn led me to
fix this in the code and then re-build the database again. A thank you
to Stephan and Markus who managed to obtain a temporary powerful server;
this ultimately enabled to do the last database rebuild within less than
a week.
I will give details about the kind of bugs that kept me busy:
To keep the database as small as possible (currently more than 400 GB
are painful, compare this to 25 GB of a PBF planet file), we store attic
data as delta to the next newer version. Unchanged details like not
changed tags aren't stored at all. For example, the tags of
<node version="3" lat="50" lon="10" timestamp="2011-01-01">
<tag k="name" v="something"/>
<tag k="highway" v="bus_stop"/>
<tag k="bus" v="yes"/>
<tag k="FIXME" v="check_name"/>
</node>
<node version="4" lat="50" lon="10" timestamp="2012-01-01">
<tag k="name" v="something else"/>
<tag k="highway" v="bus_stop"/>
<tag k="bus" v="yes"/>
<tag k="shelter" v="yes"/>
</node>
are stored as
current: "name" = "something else"
current: "highway" = "bus_stop"
current: "bus" = "yes"
current: "shelter" = "yes"
attic: before 2012: "name" = "something"
attic: before 2012: "FIXME" = "check_name"
attic: before 2012: "shelter" = void
This allows us to save space for the repeated tags "highway" and "bus".
The main traces of the bug were checksum disparities in five of the 6000
checked first augmented diffs, spanning roughly 12 September to 16
September 2012. In detail, the diffs generated from the database state
as of 1st October 2012 (thus, representing deltas to the then-current
state of 1st October 2012) were not consistent with the diffs generated
for the same minutes based on deltas to the database state as of June
2014. A detailed re-generation of the augmented diffs of both database
states has shown that an extra tag was present on the node 1700083447 in
its attic state of September 2012 computed from June 2014 in comparison
to the same node at the same attic state computed from the database as
of October 2012.
Do you have guessed what has gone wrong? Me not so far, so I had to
understand the subtle details. There are millions of nodes carrying
tags, so what is so special about this one?
It turned out that the node has moved forth and back over a significant
distance, see versions 11, 14, and 15. While the movement itself is not
so large (about 2 km), it happened to change its quadtile index to a new
value and then back to the old value. And in version 13, after the move
forth, the tag is_capital=country has been added and not changed when
the node moved back.
I managed in the delta computation to set a marker on the quadtile index
of the older position that the tag is present, but I forgot to set a
marker on the new position that the tag isn't present on earlier
versions on this quadtile index.
Now: Why didn't this pop up earlier? Haven't there been tests? There
have been tests, but obviously not enough, and you always only know
afterwards which tests have been missing. The whole problem doesn't
appear to a node when the node never existed at this place before (only
3 in a million of nodes ever get back to a quadtile index where they
have been before). And it doesn't appear either when the tag had any
value on this older node versions (so the total number of affected nodes
is less than 100), because there is then a marker for this older version
and this specific key. Both cases have been tested individually, but not
both together.
In total: I've worked to get the number of bugs down, and I'm confident
to call the database state now consistent, but there might be other
arcane bugs that affect only few objects in specific versions. So please
be bold to report suspect query results to me, but be also bold in using
the new attic database.
Cheers,
Roland
_______________________________________________
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk