Hi all,
Serbia government released address registry (streets and housenumbers with
point geometries) on 12. December 2022. as open data from national cadastre
("RGZ" in the rest of the mail). We are preparing import to OSM now and we want
to ask for feedback to our approach, what we are missing and to answer any
questions. Below are some details that we have been working on for past month.
tl;dr
Main wiki (en and sr):
https://wiki.openstreetmap.org/wiki/Serbia/Projekti/Adresni_registar
Main topic (sr only):
https://community.openstreetmap.org/t/uvoz-adresnog-registra-plan/8916
--------------------------
1. Data quality
Data from RGZ is checked by sampling and looks like it is of good quality, far
better than what we have today in OSM. In OSM, except couple of major cities
(Belgrade have 80% of addresses for example), addresses are mostly non-existent
in Serbia. There are some cases where address exists in RGZ, but building is
missing on satellite imagery (either not yet erected or is demolished), but we
agreed to add these addresses too. There are also cases with local shops (on
the ground floor) are empty/ruined, but exists in RGZ.
In total, there are around 2.428.000 addresses in RGZ and 250.000 addresses in
OSM. We think that 190.000 of these addresses can be "conflated" (by adding new
tag) and 60.000 will have to be resolved case by case. Rest of it (>2.000.000)
could be imported simply as points.
One downside is that addresses in cadastre are given as "ALL CAPS" (e.g. data
from RGZ would have something like "ABBEY ROAD" as street name). We fixed this
(more on this later).
--------------------------
2. Preparation
We have couple of topics in community forum, but main thread is this one:
https://community.openstreetmap.org/t/uvoz-adresnog-registra-plan/8916. We
prepared wiki page for import here:
https://wiki.openstreetmap.org/wiki/Serbia/Projekti/Adresni_registar.
As a community, we first discussed and agreed on tagging schema for addresses.
It was always kind of assumed to be Karlsruhe schema, but now we discussed in
fine details if address should be node or way, what if there are apartments and
some shops on ground floor etc. Main thread for this specific topic is here:
https://community.openstreetmap.org/t/uvoz-adresnog-registra-pravila-tagovanja-adresa-u-srbiji/8915
and final outcome is written in details here:
https://wiki.openstreetmap.org/wiki/Serbia/Adresses.
Regarding street names that are given as "ALL CAPS", as a community, we agreed
to import them with "Proper Casing", including grammatical and on-the-ground
rules (punctuation, spacing, hyphens, correcting cases, plurals, striping
"street" as suffix in <1% of cases...). We took all addresses in RGZ (there is
30.000 distinct addresses), put them online in
https://lite.framacalc.org/tgux01sydx-9ztp and distributed work among us to fix
their naming. This work is done already and we will use this (proper) naming
when doing import. Main topic for this is here:
https://community.openstreetmap.org/t/pravilno-imenovanje-ulica/96891/
We also created new tile server that shows only street geometries and
housenumbers based on RGZ data. It can be accessed here:
https://tiles.openstreetmap.rs/rgz/{zoom}/{x}/{y}.png. It will be invaluable
help when importing addresses and when solving cases manually.
We also agreed on script when housenumbers have letters (e.g. "30b"). We could
use either Cyrillic ("30б") or Latin ("30b"). Cyrillic is what "name" tag
usually we have for streets, but if we used it for "addr:housenumber" too, we
would run into problem - support in geocoders for different languages in
"addr:housenumber" tag is almost non-existent. So, for purely pragmatic
reasons, we opted for using latin for housenumbers (we will use "30b" instead
of "30б"). This is not related to Serbia, but it affects us greatly and we
talked about it at length even 4 years ago:
https://community.openstreetmap.org/t/cirilica-i-latinica-u-kucnim-brojevima/88545.
Whomever wants to tackle this problem, please contact me, I will be eager to
help.
We also agreed what to do when OSM and RGZ differ in addresses - we agree to
keep OSM if there is note tag, and clarified all of that in import instructions.
Finally, we introduced new tag to reference addresses with RGZ -
"ref:RS:kucni_broj" (translated to English as "ref:RS:housenumber"), as well as
new "source=RGZ_AR_Import" for changesets (we plan to add this to
https://taginfo.openstreetmap.org/projects once we import start).
--------------------------
3. Import
There are lot of addresses to import. We want import to have human in the loop,
but to be as easy as possible. We created web site to help us with this:
https://openstreetmap.rs/download/ar/.
There are 3 main cases when doing import:
* Adding new addresses - it should be as easy as going to above-mentioned web
site, navigate to municipality and settlement and downloading .osm file with
new addresses. All .osm files are split into max 100 addresses that can be
imported at once. Web site is refreshed daily. Rest of the instructions are on
wiki page, but boils down to: use JOSM, check geometries one by one, move
addresses on top of houses on satellite images, check naming of streets and
upload. We might even try to automate this after couple thousands of addresses
are added, enough of time passed and based on overall feedback from community.
We will use separate bot account for this, if we go down this route.
* Conflating existing addresses - it should be as easy as downloading another
.osm where we only add "ref:RS:kucni_broj" tag while repeating same procedure
as above. Also bounded to max 100 addresses and refreshed daily. It should be
noted here that only addresses that are matching 100% (both street name and
housenumber are exact and within 200m between OSM and RGZ) are proposed in .osm
files here.
* Fuzzy matched addresses - this is hardest case and no automation is given.
There are "only" 60.000 of these addresses and this will take most of the time.
We plan to use mentioned tile server with rendered addresses to aid in this,
but it will still require a lot of work as there is lot of randomness here
(typos in names, old addresses, locations that are too far away...)
We plan to create special tutorial video to make onboarding for people easier,
same as we did for import of administrative boundaries
(https://vimeo.com/401994061), but this time on
https://peertube.openstreetmap.fr (to be honest, we had this video on fediverse
at https://peertube.live/videos/watch/d5ef0a85-2578-4c7d-8430-1395e853eca7, but
it is gone now...).
Overall, my personal hope is that with dozens of very active people (that we
have in community already) and dozen more that are sporadically active and with
good tooling, we can have 80% of addresses from RGZ imported in OSM by end of
2023.
--------------------------
4. Quality assurance
As previously mentioned, we created website
https://openstreetmap.rs/download/ar/ which is refreshed daily and which has
section for QA. Idea is to monitor import as we go and detect any problems.
These are some of things we plan to check continuously:
* duplicated ref:RS:kucni_broj - in ideal case, there will be no 2 OSM entities
with same "ref:RS:kucni_broj" tag as there are no duplicates in RGZ either.
This report should have 0 entries. Report is done and generated daily.
* addresses in buildings - as we agreed in tagging schema, we want to place
addresses on buildings' way (if building exist). However, if we require this,
it will mean to add building for each address and it will slow down import
itself. So, plan is to proceed with import and add addresses as node, and have
this QA check that will let us know where we have addresses as nodes inside
buildings. Today, we counted 76.000 address nodes inside buildings, out of
which 57.500 are simple cases where there is single address node inside
building that can be deleted and moved to building. Rest (18.000) are cases
that need to be checked case by case (POIs, multiple addresses, typos...). For
simple case, we even have generated .osm files to automate this problem. One of
the problem of moving these nodes is that we are losing history for them. We
split .osm files to 10 entities max, so to make it easier for anyone to find
deep history of deleted nodes. Report is done and generated daily.
* QA on conflated addresses - once we detect "ref:RS:kucni_broj" tag on some
OSM address, we will have couple of checks - is that reference to RGZ actually
exists in RGZ, is it too far away from RGZ data, that those street
names/housenumbers match... This report will tell us all this and it should
ideally have 0 entries. This report is still being worked on.
--------------------------
5. Licence
License of this data is clarified as open data in
https://data.gov.rs/sr/terms/. This data is released in the open on 12.
December 2022. by changes in the law as defined in this PDF:
https://geosrbija.rs/?mdocs-file=6186 (article 34) which (along with other RGZ
documents) can be downloaded here: https://geosrbija.rs/dokumentacija/. While
we already imported some other data from Open Data portal (GTFS, admin
boundaries, national heritage...), we also wanted to add RGZ as a source to
https://www.openstreetmap.org/copyright and we have PR request:
https://github.com/openstreetmap/openstreetmap-website/pull/3959. Afterwards,
we also contacted LWG (on 08. March 2023.) for further consulting, but answer
is still pending. However, we think we are on safe side to start import even
now. Please raise concerns if this is not the case!
Thanks, Branko
_______________________________________________
Imports mailing list
[email protected]
https://lists.openstreetmap.org/listinfo/imports