Re: [Imports] Address import from government open data in Serbia

Branko Kokanovic Tue, 28 Mar 2023 14:23:29 -0700

Hi,
I didn't know for openrefine software, seems like a nice think to be aware of! 
However, we opted for "full control" approach. Our algorithm (just shouting it 
here, someone might find it useful) to create mapping from "ALL CAPS" to "All 
Caps" is something like:
* check in curated list of overridden street names (those are names that we 
crowdsourced in online spreadsheet and put in files as special cases)
* Find streets in OSM by cadastre reference (since streets are also open data). 
If found, we are sure that mapping is correct
* Normalize "ALL CAPS" name (remove punctuation, put to lowercase, trim...) and 
try to find that normalized name in OSM. If found, assume that this is correct 
street name
* Do best effort. Keep "First Letter" (as we have lot of names of people, so 
mostly first letter is capital case) and create list of words that are 
exception ("street", "river", "valley", "brigades", "stream", "creek"...). This 
is highly specific to grammar rules.

Regarding osminspector, we will surely use it during and after import.

WRT question how we plan to do conflation, we also opted for "full control" 
solution - harder, but more customizable, I think. We might be wrong on this, 
maybe it was overarchitecture, but this is what we think will give us best 
ratio of import quality/speed of import. 2.5 mil address is not small number. 
Basically, we have daily job which is set of pipelines[1] that downloads 
cadastre data, as well as PBF from OSM, does some normalization, street name 
mapping and then conflation, generates HTML and import .osm files and uploads 
everything. Conflation is done by matching street names by Levensthein 
distance, housenumbers as numeric and distance as numeric too and doing linear 
combination of these to get percentage of match. If match is perfect (100%), we 
prepare .osm files to be imported to JOSM (in these files, we just add "ref" to 
existing entities). If there is not a single address at all within 200m (0% 
match), which is very common case in villages today, we prepare .osm files to 
be added as new nodes to OSM. If there is partial match (between 0-100%), we do 
hands-off and leave it to human to sort things manually. There is import 
instructions in wiki how to handle those .osm files and I just published 
instruction video[2] (in Serbian, I will add subtitles these days),

Thanks for great suggestions! Branko

[1] https://gitlab.com/osm-serbia/adresniregistar/-/blob/main/Makefile
[2] https://peertube.openstreetmap.fr/w/s7tiAyeK592Btj9ficfHJH

On Tue, Mar 28, 2023, at 13:40, Cascafico Giovanni wrote:
> Hello Branko,
> 
> I'd like to suggest openrefine [1] for ALLCAPS and mispelling issues. The 
> tool can save a sequence of regex replaces on huge lists. Besides, a 
> replacing sequence is automatically saved and can be a resource in case of 
> further imports.
> 
> Like others pointed out, I found osminspector [2] a very useful tool for 
> post-import quality assessment.
> 
> I didn't understand how you plan to perform conflation. My approach would be 
> using osm_conflator tool and audit service [3]. Basically osm_conflator works 
> on nodes by overpass extracting a category (ie, addr:) and trying to match 
> import candidates in a certain radius. Once a set of candidates is generated, 
> actual conflation (audit) can be done via crowd-checking on a shared map like 
> this [4].
> 
> 
> 
> [1] https://openrefine.org/
> [2] 
> https://tools.geofabrik.de/osmi/?view=addresses&lon=20.40677&lat=44.84030&zoom=12
> [3] 
> https://wiki.openstreetmap.org/wiki/Import/Catalogue/Milan_addresses_import
> [4] http://audit.osmz.ru/map/MI-M9
> _______________________________________________
> Imports mailing list
> [email protected]
> https://lists.openstreetmap.org/listinfo/imports
>

_______________________________________________
Imports mailing list
[email protected]
https://lists.openstreetmap.org/listinfo/imports

Re: [Imports] Address import from government open data in Serbia

Reply via email to