Hi all,
Serbia government released address registry (streets and housenumbers with 
point geometries) on 12. December 2022. as open data from national cadastre 
("RGZ" in the rest of the mail). We are preparing import to OSM now and we want 
to ask for feedback to our approach, what we are missing and to answer any 
questions. Below are some details that we have been working on for past month.

tl;dr

Main wiki (en and sr): 
https://wiki.openstreetmap.org/wiki/Serbia/Projekti/Adresni_registar
Main topic (sr only): 
https://community.openstreetmap.org/t/uvoz-adresnog-registra-plan/8916

--------------------------

1. Data quality

Data from RGZ is checked by sampling and looks like it is of good quality, far 
better than what we have today in OSM. In OSM, except couple of major cities 
(Belgrade have 80% of addresses for example), addresses are mostly non-existent 
in Serbia. There are some cases where address exists in RGZ, but building is 
missing on satellite imagery (either not yet erected or is demolished), but we 
agreed to add these addresses too. There are also cases with local shops (on 
the ground floor) are empty/ruined, but exists in RGZ.

In total, there are around 2.428.000 addresses in RGZ and 250.000 addresses in 
OSM. We think that 190.000 of these addresses can be "conflated" (by adding new 
tag) and 60.000 will have to be resolved case by case. Rest of it (>2.000.000) 
could be imported simply as points.

One downside is that addresses in cadastre are given as "ALL CAPS" (e.g. data 
from RGZ would have something like "ABBEY ROAD" as street name). We fixed this 
(more on this later).

--------------------------

2. Preparation

We have couple of topics in community forum, but main thread is this one: 
https://community.openstreetmap.org/t/uvoz-adresnog-registra-plan/8916. We 
prepared wiki page for import here: 
https://wiki.openstreetmap.org/wiki/Serbia/Projekti/Adresni_registar.

As a community, we first discussed and agreed on tagging schema for addresses. 
It was always kind of assumed to be Karlsruhe schema, but now we discussed in 
fine details if address should be node or way, what if there are apartments and 
some shops on ground floor etc. Main thread for this specific topic is here: 
https://community.openstreetmap.org/t/uvoz-adresnog-registra-pravila-tagovanja-adresa-u-srbiji/8915
 and final outcome is written in details here: 
https://wiki.openstreetmap.org/wiki/Serbia/Adresses.

Regarding street names that are given as "ALL CAPS", as a community, we agreed 
to import them with "Proper Casing", including grammatical and on-the-ground 
rules (punctuation, spacing, hyphens, correcting cases, plurals, striping 
"street" as suffix in <1% of cases...). We took all addresses in RGZ (there is 
30.000 distinct addresses), put them online in 
https://lite.framacalc.org/tgux01sydx-9ztp and distributed work among us to fix 
their naming. This work is done already and we will use this (proper) naming 
when doing import. Main topic for this is here: 
https://community.openstreetmap.org/t/pravilno-imenovanje-ulica/96891/

We also created new tile server that shows only street geometries and 
housenumbers based on RGZ data. It can be accessed here: 
https://tiles.openstreetmap.rs/rgz/{zoom}/{x}/{y}.png. It will be invaluable 
help when importing addresses and when solving cases manually.

We also agreed on script when housenumbers have letters (e.g. "30b"). We could 
use either Cyrillic ("30б") or Latin ("30b"). Cyrillic is what "name" tag 
usually we have for streets, but if we used it for "addr:housenumber" too, we 
would run into problem - support in geocoders for different languages in 
"addr:housenumber" tag is almost non-existent. So, for purely pragmatic 
reasons, we opted for using latin for housenumbers (we will use "30b" instead 
of "30б"). This is not related to Serbia, but it affects us greatly and we 
talked about it at length even 4 years ago: 
https://community.openstreetmap.org/t/cirilica-i-latinica-u-kucnim-brojevima/88545.
 Whomever wants to tackle this problem, please contact me, I will be eager to 
help.

We also agreed what to do when OSM and RGZ differ in addresses - we agree to 
keep OSM if there is note tag, and clarified all of that in import instructions.

Finally, we introduced new tag to reference addresses with RGZ - 
"ref:RS:kucni_broj" (translated to English as "ref:RS:housenumber"), as well as 
new "source=RGZ_AR_Import" for changesets (we plan to add this to 
https://taginfo.openstreetmap.org/projects once we import start).

--------------------------

3. Import

There are lot of addresses to import. We want import to have human in the loop, 
but to be as easy as possible. We created web site to help us with this: 
https://openstreetmap.rs/download/ar/.

There are 3 main cases when doing import:

* Adding new addresses - it should be as easy as going to above-mentioned web 
site, navigate to municipality and settlement and downloading .osm file with 
new addresses. All .osm files are split into max 100 addresses that can be 
imported at once. Web site is refreshed daily. Rest of the instructions are on 
wiki page, but boils down to: use JOSM, check geometries one by one, move 
addresses on top of houses on satellite images, check naming of streets and 
upload. We might even try to automate this after couple thousands of addresses 
are added, enough of time passed and based on overall feedback from community. 
We will use separate bot account for this, if we go down this route.

* Conflating existing addresses - it should be as easy as downloading another 
.osm where we only add "ref:RS:kucni_broj" tag while repeating same procedure 
as above. Also bounded to max 100 addresses and refreshed daily. It should be 
noted here that only addresses that are matching 100% (both street name and 
housenumber are exact and within 200m between OSM and RGZ) are proposed in .osm 
files here.

* Fuzzy matched addresses - this is hardest case and no automation is given. 
There are "only" 60.000 of these addresses and this will take most of the time. 
We plan to use mentioned tile server with rendered addresses to aid in this, 
but it will still require a lot of work as there is lot of randomness here 
(typos in names, old addresses, locations that are too far away...)

We plan to create special tutorial video to make onboarding for people easier, 
same as we did for import of administrative boundaries 
(https://vimeo.com/401994061), but this time on 
https://peertube.openstreetmap.fr (to be honest, we had this video on fediverse 
at https://peertube.live/videos/watch/d5ef0a85-2578-4c7d-8430-1395e853eca7, but 
it is gone now...).

Overall, my personal hope is that with dozens of very active people (that we 
have in community already) and dozen more that are sporadically active and with 
good tooling, we can have 80% of addresses from RGZ imported in OSM by end of 
2023. 

--------------------------

4. Quality assurance

As previously mentioned, we created website 
https://openstreetmap.rs/download/ar/ which is refreshed daily and which has 
section for QA. Idea is to monitor import as we go and detect any problems. 
These are some of things we plan to check continuously:

* duplicated ref:RS:kucni_broj - in ideal case, there will be no 2 OSM entities 
with same "ref:RS:kucni_broj" tag as there are no duplicates in RGZ either. 
This report should have 0 entries. Report is done and generated daily.
* addresses in buildings - as we agreed in tagging schema, we want to place 
addresses on buildings' way (if building exist). However, if we require this, 
it will mean to add building for each address and it will slow down import 
itself. So, plan is to proceed with import and add addresses as node, and have 
this QA check that will let us know where we have addresses as nodes inside 
buildings. Today, we counted 76.000 address nodes inside buildings, out of 
which 57.500 are simple cases where there is single address node inside 
building that can be deleted and moved to building. Rest (18.000) are cases 
that need to be checked case by case (POIs, multiple addresses, typos...). For 
simple case, we even have generated .osm files to automate this problem. One of 
the problem of moving these nodes is that we are losing history for them. We 
split .osm files to 10 entities max, so to make it easier for anyone to find 
deep history of deleted nodes. Report is done and generated daily.
* QA on conflated addresses - once we detect "ref:RS:kucni_broj" tag on some 
OSM address, we will have couple of checks - is that reference to RGZ actually 
exists in RGZ, is it too far away from RGZ data, that those street 
names/housenumbers match... This report will tell us all this and it should 
ideally have 0 entries. This report is still being worked on.

--------------------------

5. Licence

License of this data is clarified as open data in 
https://data.gov.rs/sr/terms/. This data is released in the open on 12. 
December 2022. by changes in the law as defined in this PDF: 
https://geosrbija.rs/?mdocs-file=6186 (article 34) which (along with other RGZ 
documents) can be downloaded here: https://geosrbija.rs/dokumentacija/. While 
we already imported some other data from Open Data portal (GTFS, admin 
boundaries, national heritage...), we also wanted to add RGZ as a source to 
https://www.openstreetmap.org/copyright and we have PR request: 
https://github.com/openstreetmap/openstreetmap-website/pull/3959. Afterwards, 
we also contacted LWG (on 08. March 2023.) for further consulting, but answer 
is still pending. However, we think we are on safe side to start import even 
now. Please raise concerns if this is not the case!


Thanks, Branko

_______________________________________________
Imports mailing list
[email protected]
https://lists.openstreetmap.org/listinfo/imports

Reply via email to